What library functionality would you most like to see in D?

Sun Jul 31 18:56:33 PDT 2011

On Sunday 31 July 2011 21:29:31 Johann MacDonagh wrote:
> On 7/31/2011 5:57 AM, Jacob Carlborg wrote:
> >> * Lexing and parsing:
> >> 
> >> Standard facilities for these tasks could be very useful. Perhaps D
> >> could get its own dlex and dyacc or some such tools. Personally, I
> >> prefer sticking to LL(1), but LALR is generally more convenient and
> >> flexible, and thus I'd suggest something YACC/ANTLR-like.
> >> 
> >> (I know this doesn't have much to do with Phobos per se, but I figured
> >> I'd mention it.)
> > 
> > I think someone is working on this.
> 
> I've started on a port of DMD's lexer (not really a port ;) ):
> 
> https://github.com/jmacdonagh/phobos/compare/master...std.lang.d.lexer
> 
> Basically, you give it some string (string, wstring, or dstring), and it
> gives you a range of tokens back. The token has the type, a slice of the
> input that corresponds to the token, line / column, and a value (e.g. an
> integer constant).
> 
> Some features I'm planning:
> 
> 1. Support D1 and D2.
> 2. Warnings and errors returned in the tokens. For example, if you use
> an octal constant for D2 code, it will correctly return an integer
> constant token with some kind of warning flag set and a message. In
> terms of errors, if the lexer hits "0xz012", it will return an error
> token for the slice "0xz" and then start lexing an integer constant
> "012". No exceptions, easy peasy.
> 3. CTFEable. Although I'll probably have to wait till the next DMD release.
> 4. Support any kind of character range. Not sure if people want to lex
> something that's not a string/wstring/dstring.
> 
> I'm glad this was brought up. I remember Walter's post last year asking
> for this module, but the conversation seemed to kill the idea. I started
> on this just for the fun of it, but then doubted whether Phobos wanted
> it. I feel that a hand written lexer / parser is going to be faster than
> something generated, but maybe I'm old fashioned.
> 
> Anyway, Jim, if you want to do this I can move on to something else. If
> you want, I can continue on. I didn't see a branch in your repo so I'm
> not sure what you've done.

If we do a hand-written lexer of D for Phobos, it really should be a fairly 
direct port of the dmd front-end. It should be _somewhat_ D-ified as 
appropriate, (and the API should definitely be properly range-based and all 
that), but the implementation needs to be fairly close to dmd itself so that 
it's easy for someone to port changes and fixes back and forth between the two. 
Otherwise, they're going to get out of sync fairly easily. If we're not going 
to do a direct port, then we might as well just do the template-based lexer 
generator that Andrei and others would really like to see (which we should 
still do, but I think that the hand-written lexer is nowhere near as valuable 
if it's not a direct port of dmd's lexer).

Also, I see _zero_ value in making it support D1. If it's for D2's standard 
library, then what's the point of it lexing D1? That just complicates the 
lexer for what is essentially a legacy product. And given that the differences 
between D1 and D2 in dmd's lexer are covered with #ifdefs, it would be rather 
complicated to try and do a direct port which covers both D1 and D2. It would 
probably be easier if the two were completely separate.

As for what I've done so far, I'd have to go look. I haven't touched it in a 
couple of months, I expect. There has been a lot of other stuff that I've 
needed to do, and Andrei was trying to discourage such an implementation the 
last time that I brought it up. So, I haven't exactly been in a rush to get it 
done. I'd like to do it, but I've been rather busy.

So, if you really want to work on a potential D lexer for Phobos, that's fine, 
but I really think that it needs to be a rather direct port, and that doesn't 
sound like what you've been doing.

- Jonathan M Davis