std.d.lexer requirements

Mon Aug 6 13:26:52 PDT 2012

On 06-Aug-12 22:03, deadalnix wrote:
> Le 04/08/2012 15:45, Dmitry Olshansky a écrit :
>> On 04-Aug-12 15:48, Jonathan M Davis wrote:
>>> On Saturday, August 04, 2012 15:32:22 Dmitry Olshansky wrote:
>>>> I see it as a compile-time policy, that will fit nicely and solve both
>>>> issues. Just provide a templates with a few hooks, and add a Noop
>>>> policy
>>>> that does nothing.
>>>
>>> It's starting to look like figuring out what should and shouldn't be
>>> configurable and how to handle it is going to be the largest problem
>>> in the
>>> lexer...
>>>
>>
>> Let's add some meat to my post.
>> I've seen it mostly as follows:
>>
>> //user defines mixin template that is mixed in inside lexer
>> template MyConfig()
>> {
>> enum identifierTable = true; // means there would be calls to
>> table.insert on each identifier
>> enum countLines = true; //adds line, column properties to the
>> lexer/Tokens
>>
>> //statically bound callbacks, inside one can use say:
>> // skip() - to skip a char (popFront)
>> // get() - to read next char (via popFront, front)
>> // line, col - as readonly properties
>> // (skip & get do the counting if enabled)
>>
>> bool onError()
>> {
>> skip(); //the most dumb recovery, just skip a char
>> return true; //go on with tokenizing, false - stop prematurely
>> }
>>
>> ...
>> }
>>
>> usage:
>>
>>
>> {
>> auto my_supa_table = ...; //some kind of container (should a set on
>> strings and support .insert("blah"); )
>>
>> auto dlex = Lexer!(MyConfig)(table);
>> auto all_tokens = array(dlex(joiner(stdin.byChunk(4096))));
>>
>> //or if we had no interest in table but only tokens:
>> auto noop = Lexer!(NoopLex)();
>> ...
>> }
>>
>
> It seems way too much.
>
> The most complex thing that is needed is the policy to allocate
> identifiers in tokens.

Editor that highlights text may choose not to build identifier table at 
all. One may see it as a safe mode (low resource mode) for more advance IDE.

> The second parameter is a bool to tokenize comments or not. Is that
> enough ?
No.

And doing Tokens as special comment token is frankly bad idea. See 
Walter's comments in this thread.

Also e.g. For compiler only DDoc ones are ever useful, not so for IDE. 
Filtering them out later is inefficient, as it would be far better not 
to create them in the first place.

> The onError look like a typical use case for conditions as explained in
> the huge thread on Exception.

mm I lost track of that discussion. Either way I see statically bound 
function as good enough hook into the process as it can do anything 
useful: skip wrong chars, throw exception, stop parsing prematurely, 
whatever - pick your poison.

-- 
Dmitry Olshansky