Migrating dmd to D?

Sat Mar 2 21:48:02 PST 2013

On Sunday, 3 March 2013 at 03:06:15 UTC, Daniel Murphy wrote:
>> Every single one of these would have to be special-cased. If 
>> you had a domain-specific language you could keep track of 
>> whether you were mid-declaration, mid-statement, or 
>> mid-string-literal. Half the stuff you special-case could 
>> probably be applied to other C++ projects as well.
>>
>> If this works, the benefits are just enormous. In fact, I 
>> would actually like to "waste" my time trying to make this 
>> work, but I'm going to need to ask a lot of questions because 
>> my current programming skills are nowhere near the average 
>> level of posters at this forum.
>>
>> I would like a c++ lexer (with whitespace) to start with. Then 
>> a discussion of parsers and emitters. Then a ton of questions 
>> just on learning github and other basics.
>>
>> I would also like the sanction of some of the more experienced 
>> people here, saying it's at least worth a go, even if other 
>> strategies are simultaneously pursued.
>
> Something like this https://github.com/yebblies/magicport2 ?

Since you're obviously way ahead of me on this, I'm going to go 
ahead and say everything I've been thinking about this issue.

My approach to translating the source would be more-or-less 
naive. That is, I would be trying to do simple pattern-matching 
and replacement as much as possible. I would try to go as far as 
I could without the scanner knowing any context-sensitive 
information. When I added a piece of context-sensitive 
information, I would do so by observing the failures of the naive 
output, and adding pieces one by one, searching for the most bang 
for my context-sensitive buck. It would be nice to see upwards of 
50 percent or more of the code conquered by just a few such 
carefully selected context-sensitive bucks.

Eventually the point of diminishing returns would be met with 
these simple additions. It would be of utility to have a language 
at that point, which, instead of seeking direct gains in its 
ability to transform dmd code, saw its gains in the ease and 
flexibility with which one could add the increasingly obscure and 
detailed special cases to it. I don't know how to set up that 
language or its data structures, but I can tell you what I'd like 
to be able to do with it.

I would like to be able to query which function I am in, which 
class I am assembling, etc. I would like to be able to take a 
given piece of text and say exactly what text should replace it, 
so that complex macros could be rewritten to their equivalent 
static pure D functions. In other words, when push comes to 
shove, I want to be able to brute-force a particularly hard 
substitution direct access to the context-sensitive data 
structure. For example, suppose I know that some strange macro 
peculiarities of a function add an extra '}' brace which is not 
read by C++ but is picked up by the naive nesting '{}' tracker, 
which botches up its 'nestedBraceLevel' variable. It would be 
necessary to be able to say:

if (currentFunction == "oneIKnowToBeMessedUp" &&
    currentLine >= funcList.oneIKnowToBeMessedUp.startingLine +50)
    { --nestedBraceLevel; }

My founding principle is Keep It Simple Stupid. I don't know if 
it's the best way to start, but barring expert advice steering me 
away from it, it would be the best for someone like me who had no 
experience and needed to learn from the ground up what worked and 
what didn't.

Another advantage of the domain-specific language as described 
above would its reusability of whatever transformations are 
common in C++, say transforming 'strcmp(a,b)' -> 'a == b', and 
it's possible use for adding special cases to translating from 
one language to another generally speaking . I don't know the 
difference between what I'm describing and a basic macro text 
processing language - they might be the same.

My last thought is probably well-tread ground, but the 
translation program should have import dependency charts for its 
target program, and automate imports on a per-symbol basis, so it 
lays out the total file in two steps.

import std.array : front, array;

One thing I'm specifically avoiding in this proposal is a 
sophisticated awareness of the C++ grammar. I'm hoping special 
cases cover whatever ground might be more perfectly trod by a 
totally grammar-aware conversion mechanism.

Now you're as up-to-date as I am on what I'm thinking.