Looking for champion - std.lang.d.lex

Wed Nov 24 05:30:29 PST 2010

On 19/11/2010 23:39, Andrei Alexandrescu wrote:
> On 11/19/10 1:03 PM, Bruno Medeiros wrote:
>> On 22/10/2010 20:48, Andrei Alexandrescu wrote:
>>> On 10/22/10 14:02 CDT, Tomek Sowiński wrote:
>>>> Dnia 22-10-2010 o 00:01:21 Walter Bright <newshound2 at digitalmars.com>
>>>> napisał(a):
>>>>
>>>>> As we all know, tool support is important for D's success. Making
>>>>> tools easier to build will help with that.
>>>>>
>>>>> To that end, I think we need a lexer for the standard library -
>>>>> std.lang.d.lex. It would be helpful in writing color syntax
>>>>> highlighting filters, pretty printers, repl, doc generators, static
>>>>> analyzers, and even D compilers.
>>>>>
>>>>> It should:
>>>>>
>>>>> 1. support a range interface for its input, and a range interface for
>>>>> its output
>>>>> 2. optionally not generate lexical errors, but just try to recover and
>>>>> continue
>>>>> 3. optionally return comments and ddoc comments as tokens
>>>>> 4. the tokens should be a value type, not a reference type
>>>>> 5. generally follow along with the C++ one so that they can be
>>>>> maintained in tandem
>>>>>
>>>>> It can also serve as the basis for creating a javascript
>>>>> implementation that can be embedded into web pages for syntax
>>>>> highlighting, and eventually an std.lang.d.parse.
>>>>>
>>>>> Anyone want to own this?
>>>>
>>>> Interesting idea. Here's another: D will soon need bindings for CORBA,
>>>> Thrift, etc, so lexers will have to be written all over to grok
>>>> interface files. Perhaps a generic tokenizer which can be parametrized
>>>> with a lexical grammar would bring more ROI, I got a hunch D's
>>>> templates
>>>> are strong enough to pull this off without any source code generation
>>>> ala JavaCC. The books I read on compilers say tokenization is a solved
>>>> problem, so the theory part on what a good abstraction should be is
>>>> done. What you think?
>>>
>>> Yes. IMHO writing a D tokenizer is a wasted effort. We need a tokenizer
>>> generator.
>>>
>>
>> Agreed, of all the things desired for D, a D tokenizer would rank pretty
>> low I think.
>>
>> Another thing, even though a tokenizer generator would be much more
>> desirable, I wonder if it is wise to have that in the standard library?
>> It does not seem to be of wide enough interest to be in a standard
>> library. (Out of curiosity, how many languages have such a thing in
>> their standard library?)
>
> Even C has strtok.
>
> Andrei

That's just a fancy splitter, I wouldn't call that a proper tokenizer. I 
meant something that, at the very least, would tokenize based on regular 
expressions (and have heterogenous tokens).

-- 
Bruno Medeiros - Software Engineer