Looking for champion - std.lang.d.lex

Walter Bright newshound2 at digitalmars.com
Mon Oct 25 18:10:55 PDT 2010


Nick Sabalausky wrote:
> "Walter Bright" <newshound2 at digitalmars.com> wrote in message 
> news:ia3c3r$14k8$1 at digitalmars.com...
>> Does Goldie's lexer not convert numeric literals to integer values?
>>
>> Are all tokens returned as strings?
>>
> 
> Goldie's lexer (and parser) are based on the GOLD system ( 
> http://www.devincook.com/goldparser/ ) which is deliberately independent of 
> both grammar and implementation language. As such, it doesn't know anything 
> about what the specific terminals actually represent (There are 4 exceptions 
> though: Comment tokens, Whitespace tokens, an "Error" token (ie, for lex 
> errors), and the EOF token.) So the lexed data is always represented as a 
> string.
> 
> Although, the lexer actually returns an array of "class Token" ( 
> http://www.semitwist.com/goldiedocs/current/Docs/APIRef/Token/#Token ). To 
> get the original data that got lexed or parsed into that token, you call 
> "toString()". (BTW, there are currently different "modes" of "toString()" 
> for non-terminals, but I'm considering just ripping them all out and 
> replacing them with a single "return a slice from the start of the first 
> terminal to the end of the last terminal" - unless you think it would be 
> useful to get a representation of the non-terminal's original data sans 
> comments/whitespace, or with comments/whitespace converted to a single 
> space.)
> 
> I'm not sure that calling "to!whatever(token.toString())" is really all that 
> much of a problem for user code.

Consider a string literal, say "abc\"def". With Goldie's method, I infer this 
string has to be scanned twice. Once to find its limits, and the second to 
convert it to the actual string. The latter is user code and will have to 
replicate whatever Goldie did.


>> If I may suggest, leave the low level stuff out of the api until demand 
>> for it justifies it. It's hard to predict just what will be useful, so I 
>> suggest conservatism rather than kitchen sink. It can always be added 
>> later, but it's really hard to remove.
> 
> That may be a good idea.

What Goldie will be compared against is Spirit. Spirit is a reasonably 
successful add-on to C++. Goldie doesn't have to do things the same way as 
Spirit (expression templates - ugh), but it should be as easy to use and at 
least as powerful.


>> That too, but I meant a clutter of files. Long files aren't a problem with 
>> D.
> 
> Well, again, it may not be a problem with DMD, but I really think 
> reading/editing a long file is a pain regardless of language. Maybe we just 
> have different ideas of "long file"? To put it into numbers: At the moment, 
> Goldie's library (not counting tools and the optional generated 
> "static-mode" files) is about 3200 lines, including comment/blank lines. 
> That size would be pretty unwieldy to maintain as a single source file, 
> particularly since Goldie has a natural internal organization.

Actually, I think 3200 lines is of moderate, not large, size :-)


> Personally, I'd much rather have a clutter of source files than a cluttered 
> source file. (But of course, I don't go to Java extremes and put *every* 
> tiny little thing in a separate file.) As long as the complexity of having 
> multiple files isn't passed along to user code (hence the frequent "module 
> foo.all" idiom), then I can't say I really see a problem.

I tend to just not like having to constantly grep to see which file XXX is in.


More information about the Digitalmars-d mailing list