Formal review of std.lexer

Vladimir Panteleev vladimir at thecybershadow.net
Fri Feb 21 04:59:24 PST 2014


On Friday, 21 February 2014 at 12:12:17 UTC, Dicebot wrote:
> http://wiki.dlang.org/Review/std.lexer

First of all, thank you for the great work. This is a very 
important project.

I'll begin with reviewing the documentation.

> Summary

Some simple explanation of the terminology and concepts would be 
nice. At least a link to Wikipedia.

> Create the string array costants for your language.

Typo ("costants")

> Examples:

An inline complete example of a very simple language would be 
nice.

> A lexer for D is available here.

Although good to have, this is too much to take in all at once, 
for documentation purposes.

> A lexer for Lua is available here.

Nary a comment in sight. This might serve as the example lexer if 
only it was better commented. The comments can be copy-pasted 
from the module documentation, even that would make the code much 
easier to grok.

> Template Parameter Definitions

What does this mean? Parameters to what template?

Can this section be moved to inside the documentation of Lexer, 
and Lexer be moved to the first documented symbol in the file?

> A function that serves as the default token lexing function. 
> For most languages this will be the identifier lexing function.

Should the function signature and contracts be explained here?

> This function must return bool and take a single size_t 
> argument representing the number of bytes to skip over before 
> looking for a separating character.

I think it's better to describe the signature in D syntax rather 
than English.

> A listing of tokens whose value is variable, such as 
> whitespace, identifiers, number literals, and string literals.

No mention of how the list is represented (is it an array? what 
type of elements should the array have? how are the array values 
used?), the reader is left to figure that out from the example 
below.

> Template for determining the type used for a token type. 
> Selects the smallest unsigned integral type that is able to 
> hold the value staticTokens.length + dynamicTokens.length + 
> possibleDefaultTokens.length. For example if there are 20 
> static tokens, 30 dynamic tokens, and 10 possible default 
> tokens, this template will alias itself to ubyte, as 20 + 30 + 
> 10 < ubyte.max.

Should this be documented? I understand that this will be 
instantiated only once, by std.lexer.

Utility declarations should preferably be at the end of the 
module, so that they appear last in the documentation.

> Generates the token type identifier for the given symbol. There 
> are two special cases:

Are these magic constants necessary? Why not declare them as 
enums?

>  In all cases this template will alias itself to a constant of 
> type IdType. This template will fail at compile time if symbol 
> is not one of the staticTokens, dynamicTokens, or 
> possibleDefaultTokens.

Style nit: D symbols should be wrapped in the $D(...) macro.

> == overload for the the token type.

Is it really necessary to document opEquals?

But since it's here: how does it interact with extraFields?

> The Column number at which this token occurs.

There was a lot of bikeshedding regarding the correct terminology 
to use when adding -vcolumns to DMD ( 
https://github.com/D-Programming-Language/dmd/pull/3077 ). I 
think the documentation should at least mention what exactly it 
is counting.

> A function that serves as the default token lexing function. 
> For most languages this will be the identifier lexing function.

What should the function's name be? How will it interact with 
Lexer? (It's not clear that this refers to the 
defaultTokenFunction parameter, especially after the previous 
list item, popFront, is a different piece of the puzzle.)

The documentation for Lexer's arguments seem to be thrown all 
around the module. I suggest to document them only once, all in 
Lexer's DDoc, add example signatures, and move Lexer to the top 
of the module.

> Examples:
> struct CalculatorLexer

I think this should be expanded into a full, well-documented 
example featured in the module DDoc.

> _popFront();

Where did that come from? Perhaps you meant Lexer's DDoc to say 
"which should call this mixin's _popFront()", and the DDoc 
escaped the _ character? If so, why not use a named mixin to 
disambiguate instead of _?

> struct LexerRange;
> struct StringCache;

These are thoroughly documented, but will they be used by 
anything other than std.lexer?


More information about the Digitalmars-d mailing list