std.d.lexer requirements

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Aug 2 13:26:38 PDT 2012


On 8/2/12 2:17 PM, Michel Fortin wrote:
> On 2012-08-02 12:28:03 +0000, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>> Hence, a very simple thing to do is have the entire lexer only deal
>> with ranges of ubyte. If someone passes a char[], the lexer's front
>> end can simply call s.representation and obtain the underlying ubyte[].
>
> That's ugly, but it could work (assuming s.representation returns the
> casted range by ref). I still prefer my frontUnit and popFrontUnit
> approach though.

I agree frontUnit and popFrontUnit are more generic because they allow 
other ranges to define them.

> In fact, any parser for which speed is important will have to bypass
> std.range's clever handling of UTF characters. Dealing simply with
> ubytes isn't enough, since in some cases you'll want to fire up the UTF
> decoder.
>
> The next issue, which I haven's seen discussed here is that for a parser
> to be efficient it should operate on buffers. You can make it work with
> arbitrary ranges, but if you don't have a buffer you can slice when you
> need to preserve a string, you're going to have to build the string
> character by character, which is not efficient at all. But then you can
> only really return slices if the underlying representation is the same
> as the output representation, and unless your API has a templated output
> type, you're going to special case a lot of things.

I think a BufferedRange could go a long way here.

> After having attempted an XML parser with ranges, I'm not sure parsing
> using generic ranges can be made very efficient. Automatic conversion to
> UTF-32 is a nuisance for performance, and if the output needs to return
> parts of the input, you'll need to create an inefficient special case
> just to allocate many new strings in the correct format.

I'm not so sure, but I'll measure.

> I wonder how your call with Walter will turn out.

What call?


Thanks,

Andrei


More information about the Digitalmars-d mailing list