Looking for champion - std.lang.d.lex

Nick Sabalausky a at a.a
Mon Oct 25 13:50:51 PDT 2010


"Walter Bright" <newshound2 at digitalmars.com> wrote in message 
news:ia3c3r$14k8$1 at digitalmars.com...
>
> Does Goldie's lexer not convert numeric literals to integer values?
>
> Are all tokens returned as strings?
>

Goldie's lexer (and parser) are based on the GOLD system ( 
http://www.devincook.com/goldparser/ ) which is deliberately independent of 
both grammar and implementation language. As such, it doesn't know anything 
about what the specific terminals actually represent (There are 4 exceptions 
though: Comment tokens, Whitespace tokens, an "Error" token (ie, for lex 
errors), and the EOF token.) So the lexed data is always represented as a 
string.

Although, the lexer actually returns an array of "class Token" ( 
http://www.semitwist.com/goldiedocs/current/Docs/APIRef/Token/#Token ). To 
get the original data that got lexed or parsed into that token, you call 
"toString()". (BTW, there are currently different "modes" of "toString()" 
for non-terminals, but I'm considering just ripping them all out and 
replacing them with a single "return a slice from the start of the first 
terminal to the end of the last terminal" - unless you think it would be 
useful to get a representation of the non-terminal's original data sans 
comments/whitespace, or with comments/whitespace converted to a single 
space.)

I'm not sure that calling "to!whatever(token.toString())" is really all that 
much of a problem for user code.

> If I may suggest, leave the low level stuff out of the api until demand 
> for it justifies it. It's hard to predict just what will be useful, so I 
> suggest conservatism rather than kitchen sink. It can always be added 
> later, but it's really hard to remove.

That may be a good idea.

>
> That too, but I meant a clutter of files. Long files aren't a problem with 
> D.

Well, again, it may not be a problem with DMD, but I really think 
reading/editing a long file is a pain regardless of language. Maybe we just 
have different ideas of "long file"? To put it into numbers: At the moment, 
Goldie's library (not counting tools and the optional generated 
"static-mode" files) is about 3200 lines, including comment/blank lines. 
That size would be pretty unwieldy to maintain as a single source file, 
particularly since Goldie has a natural internal organization.

Personally, I'd much rather have a clutter of source files than a cluttered 
source file. (But of course, I don't go to Java extremes and put *every* 
tiny little thing in a separate file.) As long as the complexity of having 
multiple files isn't passed along to user code (hence the frequent "module 
foo.all" idiom), then I can't say I really see a problem.





More information about the Digitalmars-d mailing list