D port of dmd: Lexer, Parser, AND CodeGenerator fully operational
Zach the Mystic
reachMINUSTHISzachgmail at dot.com
Thu Mar 8 08:22:26 PST 2012
On Thursday, 8 March 2012 at 07:49:57 UTC, Jonathan M Davis wrote:
> The lexer is going to need to take a range of dchar (which may
> or may not be an array),
> And while the lexer would need to operate on generic ranges of
> dchar, it would probably have to be special-cased for strings
> in a number of places
I know what you mean. I actually cut out ddmd's conversion stuff
because I had glanced over phobos I saw plenty of functions
designed for this! I must have intuited what you are saying. dmd
does all conversion to char* prior to sending the buffer to the
lexer. I doubt there's a reason to change this procedure, only to
put that conversion code directly into module dmd.lexer instead.
> The parser would then take a range of tokens and then output
> the AST in some form or other - it probably couldn't be
> range, but I'm not sure.
Dmd's AST is pretty idiosyncratic.
Example: class FuncDeclaration (function declaration ) has a
bunch of named members:
{
Identifier ident; // the function's name
Parameter[] parameters; // its parameters
Statement frequire; // the in{} contract, if present
Statement fbody; // function body
etc.
Each one has its own name. I actually was working on how to turn
it into a more iterable format, since if you want to edit the AST
directly you're going to need to cursor down or up to the element
you want. It's actually doable, but it's not a natural range-ish
format. That's where I'm confused about the licensing issues,
since I'm not sure if the particular object structure which gets
parsed is also going to be in phobos or if it must remain GPL,
which I'm not sure I want to continue using.
> So, if you're not familiar with ranges, you probably have a
> fair bit of
> learning ahead of you, and you're probably going to have to
> make a number of
> changes to your lexer and parser (though the majority of it
> will probably be
> able to stay intact). Unfortunately, a proper article and
> tutorial on them is
> currently lacking in spite of the fact that Phobos uses them
> heavily.
> Fortunately however, in a book that Ali Çehreli is writing on
> D, he has a
> chapter on ranges that should help get you started:
>
> http://ddili.org/ders/d.en/ranges.html
>
> But I'd suggest that you play around with ranges a fair bit
> (especially with
> strings) before trying to change what you have to use them.
> std.algorithm in
> particular makes heavy use of ranges. And it wouldn't surprise
> me at all if
> some portions of your lexer and parser really should be using
> some of Phobos'
> functions but isn't currently, because it's originally a port
> from C++. You
> should also make sure that you understand the basics of Unicode
> fairly well -
> especially with how they pertain to char, wchar, and dchar -
> since that will
> affect your ability to correctly translate code to use ranges
> as well as
> properly optimize them.
>
> It would probably help if other D developers who are more
> familiar with ranges
> took a look at what you have and maybe even helped you start
> adjusting your
> code, but I don't know how many will both have the time and be
> interested. If
> I have time, I'll probably start poking at it, but I don't know
> that I'll have
> time any time soon, much as I'd like to.
>
> Regardless, you need to familiarize yourself with ranges if you
> want to get
> the lexer and parser ready for inclusion in Phobos. And you
> really should
> familiarize yourself with them anyway, since they're heavily
> used in D code in
> general. Not being able to use ranges in D would be like not
> being able to use
> iterators in C++. You can program in it, but you'd be fairly
> crippled -
> particularly when dealing with the standard library.
>
> - Jonathan M Davis
More information about the Digitalmars-d-announce
mailing list