Looking for champion - std.lang.d.lex

Bruno Medeiros brunodomedeiros+spam at com.gmail
Wed Nov 24 11:52:17 PST 2010


On 24/11/2010 13:30, Bruno Medeiros wrote:
> On 19/11/2010 23:39, Andrei Alexandrescu wrote:
>> On 11/19/10 1:03 PM, Bruno Medeiros wrote:
>>> On 22/10/2010 20:48, Andrei Alexandrescu wrote:
>>>> On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
>>>>> Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 at digitalmars.com>
>>>>> napisał(a):
>>>>>
>>>>>> As we all know, tool support is important for D's success. Making
>>>>>> tools easier to build will help with that.
>>>>>>
>>>>>> To that end, I think we need a lexer for the standard library -
>>>>>> std.lang.d.lex. It would be helpful in writing color syntax
>>>>>> highlighting filters, pretty printers, repl, doc generators, static
>>>>>> analyzers, and even D compilers.
>>>>>>
>>>>>> It should:
>>>>>>
>>>>>> 1. support a range interface for its input, and a range interface for
>>>>>> its output
>>>>>> 2. optionally not generate lexical errors, but just try to recover
>>>>>> and
>>>>>> continue
>>>>>> 3. optionally return comments and ddoc comments as tokens
>>>>>> 4. the tokens should be a value type, not a reference type
>>>>>> 5. generally follow along with the C++ one so that they can be
>>>>>> maintained in tandem
>>>>>>
>>>>>> It can also serve as the basis for creating a javascript
>>>>>> implementation that can be embedded into web pages for syntax
>>>>>> highlighting, and eventually an std.lang.d.parse.
>>>>>>
>>>>>> Anyone want to own this?
>>>>>
>>>>> Interesting idea. Here's another: D will soon need bindings for CORBA,
>>>>> Thrift, etc, so lexers will have to be written all over to grok
>>>>> interface files. Perhaps a generic tokenizer which can be parametrized
>>>>> with a lexical grammar would bring more ROI, I got a hunch D's
>>>>> templates
>>>>> are strong enough to pull this off without any source code generation
>>>>> ala JavaCC. The books I read on compilers say tokenization is a solved
>>>>> problem, so the theory part on what a good abstraction should be is
>>>>> done. What you think?
>>>>
>>>> Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
>>>> generator.
>>>>
>>>
>>> Agreed, of all the things desired for D, a D tokenizer would rank pretty
>>> low I think.
>>>
>>> Another thing, even though a tokenizer generator would be much more
>>> desirable, I wonder if it is wise to have that in the standard library?
>>> It does not seem to be of wide enough interest to be in a standard
>>> library. (Out of curiosity, how many languages have such a thing in
>>> their standard library?)
>>
>> Even C has strtok.
>>
>> Andrei
>
> That's just a fancy splitter, I wouldn't call that a proper tokenizer. I
> meant something that, at the very least, would tokenize based on regular
> expressions (and have heterogenous tokens).
>

In other words, a lexer, that might be a better term in this context.

-- 
Bruno Medeiros - Software Engineer


More information about the Digitalmars-d mailing list