Migrating dmd to D?

Mon Mar 4 19:18:08 PST 2013

"Zach the Mystic" <reachBUTMINUSTHISzach at gOOGLYmail.com> wrote in message 
news:oxcqgprnwnsuzngfijyg at forum.dlang.org...
>
> I would like to play devil's advocate myself, at least on 0 -> Loc(0).
>
> I found that in the source, the vast, vast majority of Loc instances were 
> named, of course, 'loc'. Of the few other ones, only 'endloc' was ever 
> assigned to 0. The token matcher could substitute:
>
> 'loc = 0' -> 'loc = Loc(0)'
> 'endloc = 0' -> 'endloc = Loc(0)'
>

This is fairly rare.

> As long as it had a list of the D's AST classes, a pretty conservative 
> attempt to knock out a huge number of additional cases is:
> 'new DmdClassName(0' -> 'new DmdClassName(Loc(0)'
>

Yes, this mostly works, and is exactly what I did in a previous attempt.

> The core principle with the naive approach is to take advantage of 
> specific per-project conventions such as always giving the Loc first. The 
> more uniformity with which the project has been implemented, the more 
> likely this approach will work.
>
> A lot of those other implicit conversions I do agree seem daunting. The 
> naive approach would require two features, one, a basic way of tracking a 
> variable's type. For example, it could have a list of known 'killer' types 
> which cause problems. When it sees one it records the next identifier it 
> finds and associates it to that type for the rest of the function. It may 
> then be slightly better able to known patterns where conversion is 
> desirable. The second feature would be a brute force way of saying, "You 
> meet pattern ZZZ: if in function XXX::YYY, replace it with WWW, else 
> replace with UUU." This is clearly the point of diminishing returns for 
> the naive approach, at which point I could only hope that a good 
> abstraction could make up a lot of  ground when found necessary.
>

My experience was that you don't need to explicitly track which function you 
are in, just keeping track of the file and matching a longer pattern is 
enough.

Here is one of the files of patterns I made: http://dpaste.dzfl.pl/3c9be703
Obviously this could be shorter with a dsl, and towards the end I started 
using a less verbose SM + DumpOut approach.

> The point of diminishing returns for the whole naive approach is reached 
> when for every abstraction you add, you end up breaking as much code as 
> you fix. Then you're stuck with the grunt work of adding special case 
> after special case, and you might as well try something else at that 
> point...
>

Yeah...

> My current situation is that my coding skills will lag behind my ability 
> to have ideas, so I don't have anything rearding my approach up and 
> running for comparison, but I want the conversation to be productive, so 
> I'll give you the ideas I've had since yesterday.
>
> I would start by creating a program which converts the source by class, 
> one class at a time, and one file for each. It has a list of classes to 
> convert, and a list of data, methods, and overrides for each class - it 
> will only include what's on the list, so you can add classes and functions 
> one step at a time. For each method or override, a file to find it in, and 
> maybe a hint as to about where the function begins in said file.
>

That is waaaay to much information to gather manually.  There are a LOT of 
classes and functions in dmd.

> You may have already thought of these, but just to say them out loud, some 
> more token replacements I was thinking of:
>
> 'SameName::SameName(...ABC...) : DifferentName(...XYZ...) {'
> ->
> 'this(...ABC...)
> {
>     super(...XYZ...);'
>
> Standard reference semantics:
> 'DTreeClass *' -> 'DTreeClass'
>
> Combined, they look like this:
> 'OrOrExp::OrOrExp(Loc loc, Expression *e1, Expression *e2)
>         : BinExp(loc, TOKoror, sizeof(OrOrExp), e1, e2)
> {'
> ->
> 'this(Loc loc, Expression e1, Expression e2)
> {
>     super(loc, TOKoror, sizeof(OrOrExp), e1, e2);'
>

Like I said, I went down this path before, and made some progress.  It 
resulted in a huge list of cases.
My second attempt was to 'parse' c++, recognising preprocessor constructs as 
regular ones.  The frequent use of #ifdef cutting expressions makes this 
very, very difficult.
So my current approach is to filter out the preprocessor conditionals first, 
before parsing.  #defines and #pragmas survive to parsing.

In short, doing this at the token level works, but because you're 
transforming syntax, not text, it's better to work on a syntax tree.