std.d.lexer requirements
Michel Fortin
michel.fortin at michelf.ca
Thu Aug 2 11:17:37 PDT 2012
On 2012-08-02 12:28:03 +0000, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> said:
> Regarding the problem at hand, it's becoming painfully obvious to me
> that the lexer MUST do its own decoding internally.
That's not a great surprise to me. I hit the same issues when writing
my XML parser, which is why I invented functions called frontUnit and
popFrontUnit. I'm glad you're realizing this.
> Hence, a very simple thing to do is have the entire lexer only deal
> with ranges of ubyte. If someone passes a char[], the lexer's front end
> can simply call s.representation and obtain the underlying ubyte[].
That's ugly, but it could work (assuming s.representation returns the
casted range by ref). I still prefer my frontUnit and popFrontUnit
approach though.
In fact, any parser for which speed is important will have to bypass
std.range's clever handling of UTF characters. Dealing simply with
ubytes isn't enough, since in some cases you'll want to fire up the UTF
decoder.
The next issue, which I haven's seen discussed here is that for a
parser to be efficient it should operate on buffers. You can make it
work with arbitrary ranges, but if you don't have a buffer you can
slice when you need to preserve a string, you're going to have to build
the string character by character, which is not efficient at all. But
then you can only really return slices if the underlying representation
is the same as the output representation, and unless your API has a
templated output type, you're going to special case a lot of things.
After having attempted an XML parser with ranges, I'm not sure parsing
using generic ranges can be made very efficient. Automatic conversion
to UTF-32 is a nuisance for performance, and if the output needs to
return parts of the input, you'll need to create an inefficient special
case just to allocate many new strings in the correct format.
I wonder how your call with Walter will turn out.
--
Michel Fortin
michel.fortin at michelf.ca
http://michelf.ca/
More information about the Digitalmars-d
mailing list