std.d.lexer : voting thread

Mon Oct 7 17:16:45 PDT 2013

On 10/4/13 5:24 PM, Andrei Alexandrescu wrote:
> On 10/2/13 7:41 AM, Dicebot wrote:
>> After brief discussion with Brian and gathering data from the review
>> thread, I have decided to start voting for `std.d.lexer` inclusion into
>> Phobos.
>
> Thanks all involved for the work, first of all Brian.
>
> I have the proverbial good news and bad news. The only bad news is that
> I'm voting "no" on this proposal.
>
> But there's plenty of good news.
>
> 1. I am not attempting to veto this, so just consider it a normal vote
> when tallying.
>
> 2. I do vote for inclusion in the /etc/ package for the time being.
>
> 3. The work is good and the code valuable, so even in the case my
> suggestions (below) will be followed, a virtually all code pulp that
> gets work done can be reused.
[snip]

To put my money where my mouth is, I have a proof-of-concept tokenizer 
for C++ in working state.

http://dpaste.dzfl.pl/d07dd46d

It contains some rather unsavory bits (I'm sure a ctRegex would be nicer 
for parsing numbers etc), but it works on a lot of code just swell.

Most importantly, there's a clear distinction between the generic core 
and the C++-specific part. It should be obvious how to use the generic 
matcher for defining a D tokenizer.

Token representation is minimalistic and expressive. Just write tk!"<<" 
for left shift, tk!"int" for int etc. Typos will be detected during 
compilation. One does NOT need to define and use TK_LEFTSHIFT or TK_INT; 
all needed by the generic tokenizer is the list of tokens. In return, it 
offers an efficient trie-based matcher for all tokens.

(Keyword matching is unusual in that keywords are first found by the 
trie matcher, and then a simple check figures whether more characters 
follow, e.g. "if" vs. "iffy". Given that many tokenizers use a hashtable 
anyway to look up all symbols, there's no net loss of speed with this 
approach.)

The lexer generator compiles fast and should run fast. If not, it should 
be easy to improve at the matcher level.

Now, what I'm asking for is that std.d.lexer builds on this design 
instead of the traditional one. At a slight delay, we get the proverbial 
fishing rod IN ADDITION TO of the equally proverbial fish, FOR FREE. It 
is quite evident there's a bunch of code sharing going on already 
between std.d.lexer and the proposed design, so it shouldn't be hard to 
effect the adaptation.

So with this I'm leaving it all within the hands of the submitter and 
the review manager. I didn't count the votes, but we may have a "yes" 
majority built up. Since additional evidence has been introduce, I 
suggest at least a revote. Ideally, there would be enough motivation for 
Brian to suspend the review and integrate the proposed design within 
std.d.lexer.

Andrei