Migrating dmd to D?

Mon Mar 4 11:30:05 PST 2013

On Monday, 4 March 2013 at 02:36:23 UTC, Daniel Murphy wrote:
>> What were the biggest and most common reasons you needed 
>> context information?
>
> Turning implicit into explicit conversions.  A big one is 0 -> 
> Loc(0).
> dinteger_t -> size_t.  void* -> char*.  string literal to 
> char*.  string
> literal to unsigned char*.  unsigned -> unsigned char.  int -> 
> bool.

I would like to play devil's advocate myself, at least on 0 -> 
Loc(0).

I found that in the source, the vast, vast majority of Loc 
instances were named, of course, 'loc'. Of the few other ones, 
only 'endloc' was ever assigned to 0. The token matcher could 
substitute:

'loc = 0' -> 'loc = Loc(0)'
'endloc = 0' -> 'endloc = Loc(0)'

As long as it had a list of the D's AST classes, a pretty 
conservative attempt to knock out a huge number of additional 
cases is:
'new DmdClassName(0' -> 'new DmdClassName(Loc(0)'

The core principle with the naive approach is to take advantage 
of specific per-project conventions such as always giving the Loc 
first. The more uniformity with which the project has been 
implemented, the more likely this approach will work.

A lot of those other implicit conversions I do agree seem 
daunting. The naive approach would require two features, one, a 
basic way of tracking a variable's type. For example, it could 
have a list of known 'killer' types which cause problems. When it 
sees one it records the next identifier it finds and associates 
it to that type for the rest of the function. It may then be 
slightly better able to known patterns where conversion is 
desirable. The second feature would be a brute force way of 
saying, "You meet pattern ZZZ: if in function XXX::YYY, replace 
it with WWW, else replace with UUU." This is clearly the point of 
diminishing returns for the naive approach, at which point I 
could only hope that a good abstraction could make up a lot of  
ground when found necessary.

The point of diminishing returns for the whole naive approach is 
reached when for every abstraction you add, you end up breaking 
as much code as you fix. Then you're stuck with the grunt work of 
adding special case after special case, and you might as well try 
something else at that point...

My current situation is that my coding skills will lag behind my 
ability to have ideas, so I don't have anything rearding my 
approach up and running for comparison, but I want the 
conversation to be productive, so I'll give you the ideas I've 
had since yesterday.

I would start by creating a program which converts the source by 
class, one class at a time, and one file for each. It has a list 
of classes to convert, and a list of data, methods, and overrides 
for each class - it will only include what's on the list, so you 
can add classes and functions one step at a time. For each method 
or override, a file to find it in, and maybe a hint as to about 
where the function begins in said file.

You may have already thought of these, but just to say them out 
loud, some more token replacements I was thinking of:

'SameName::SameName(...ABC...) : DifferentName(...XYZ...) {'
->
'this(...ABC...)
{
     super(...XYZ...);'

Standard reference semantics:
'DTreeClass *' -> 'DTreeClass'

Combined, they look like this:
'OrOrExp::OrOrExp(Loc loc, Expression *e1, Expression *e2)
         : BinExp(loc, TOKoror, sizeof(OrOrExp), e1, e2)
{'
->
'this(Loc loc, Expression e1, Expression e2)
{
     super(loc, TOKoror, sizeof(OrOrExp), e1, e2);'