Migrating dmd to D?
Daniel Murphy
yebblies at nospamgmail.com
Mon Mar 4 19:18:08 PST 2013
"Zach the Mystic" <reachBUTMINUSTHISzach at gOOGLYmail.com> wrote in message
news:oxcqgprnwnsuzngfijyg at forum.dlang.org...
>
> I would like to play devil's advocate myself, at least on 0 -> Loc(0).
>
> I found that in the source, the vast, vast majority of Loc instances were
> named, of course, 'loc'. Of the few other ones, only 'endloc' was ever
> assigned to 0. The token matcher could substitute:
>
> 'loc = 0' -> 'loc = Loc(0)'
> 'endloc = 0' -> 'endloc = Loc(0)'
>
This is fairly rare.
> As long as it had a list of the D's AST classes, a pretty conservative
> attempt to knock out a huge number of additional cases is:
> 'new DmdClassName(0' -> 'new DmdClassName(Loc(0)'
>
Yes, this mostly works, and is exactly what I did in a previous attempt.
> The core principle with the naive approach is to take advantage of
> specific per-project conventions such as always giving the Loc first. The
> more uniformity with which the project has been implemented, the more
> likely this approach will work.
>
> A lot of those other implicit conversions I do agree seem daunting. The
> naive approach would require two features, one, a basic way of tracking a
> variable's type. For example, it could have a list of known 'killer' types
> which cause problems. When it sees one it records the next identifier it
> finds and associates it to that type for the rest of the function. It may
> then be slightly better able to known patterns where conversion is
> desirable. The second feature would be a brute force way of saying, "You
> meet pattern ZZZ: if in function XXX::YYY, replace it with WWW, else
> replace with UUU." This is clearly the point of diminishing returns for
> the naive approach, at which point I could only hope that a good
> abstraction could make up a lot of ground when found necessary.
>
My experience was that you don't need to explicitly track which function you
are in, just keeping track of the file and matching a longer pattern is
enough.
Here is one of the files of patterns I made: http://dpaste.dzfl.pl/3c9be703
Obviously this could be shorter with a dsl, and towards the end I started
using a less verbose SM + DumpOut approach.
> The point of diminishing returns for the whole naive approach is reached
> when for every abstraction you add, you end up breaking as much code as
> you fix. Then you're stuck with the grunt work of adding special case
> after special case, and you might as well try something else at that
> point...
>
Yeah...
> My current situation is that my coding skills will lag behind my ability
> to have ideas, so I don't have anything rearding my approach up and
> running for comparison, but I want the conversation to be productive, so
> I'll give you the ideas I've had since yesterday.
>
> I would start by creating a program which converts the source by class,
> one class at a time, and one file for each. It has a list of classes to
> convert, and a list of data, methods, and overrides for each class - it
> will only include what's on the list, so you can add classes and functions
> one step at a time. For each method or override, a file to find it in, and
> maybe a hint as to about where the function begins in said file.
>
That is waaaay to much information to gather manually. There are a LOT of
classes and functions in dmd.
> You may have already thought of these, but just to say them out loud, some
> more token replacements I was thinking of:
>
> 'SameName::SameName(...ABC...) : DifferentName(...XYZ...) {'
> ->
> 'this(...ABC...)
> {
> super(...XYZ...);'
>
> Standard reference semantics:
> 'DTreeClass *' -> 'DTreeClass'
>
> Combined, they look like this:
> 'OrOrExp::OrOrExp(Loc loc, Expression *e1, Expression *e2)
> : BinExp(loc, TOKoror, sizeof(OrOrExp), e1, e2)
> {'
> ->
> 'this(Loc loc, Expression e1, Expression e2)
> {
> super(loc, TOKoror, sizeof(OrOrExp), e1, e2);'
>
Like I said, I went down this path before, and made some progress. It
resulted in a huge list of cases.
My second attempt was to 'parse' c++, recognising preprocessor constructs as
regular ones. The frequent use of #ifdef cutting expressions makes this
very, very difficult.
So my current approach is to filter out the preprocessor conditionals first,
before parsing. #defines and #pragmas survive to parsing.
In short, doing this at the token level works, but because you're
transforming syntax, not text, it's better to work on a syntax tree.
More information about the Digitalmars-d
mailing list