Status of std.xml (D2/Phobos)
Michel Fortin
michel.fortin at michelf.com
Mon Jun 28 11:46:08 PDT 2010
On 2010-06-28 14:27:13 -0400, Andrei Alexandrescu
<SeeWebsiteForEmail at erdani.org> said:
>> Here's the generated documentation:
>>
>> http://michelf.com/docs/d/mfr/xmltok.html
>> http://michelf.com/docs/d/mfr/xml.html
>>
>> I'm slowly revamping it to use ranges instead of strings.
>
> I think a tokenizer should be a higher-order range that is fed an input
> range of ubyte, char, wchar, or dchar (so that would be a type
> parameter) and is itself a range of Tokens that include the token type,
> token value etc.
And I've implemented a tokenizer range just like you describe on top of
my tokenizer function. Look at the documentation for
mfr.xmltok.XMLForwardRange. (I should probably rename it to
XMLTokenRange.)
Personally, I prefer to use the callback approach which automatically
calls the right function according to the token type. But what's nice
about my tokenizer is that you can do both callbacks and pull-style
tokenization (the later can be wrapped in a range), and mix these
approaches together as needed.
What is missing is taking arbitrary ranges as input (it deals with
strings currently). Strings are like the optimized case for
tokenization because you don't have to dynamically allocate anything:
referencing the original string is enough when making substrings. With
arbitrary ranges you have to copy the text and tag names to a string
one character at a time, which is less efficient. I don't want to write
two separate parsers for this, so I'm trying to abstract things at the
right level to maximize code reuse while keeping performance optimized
for the string-as-input case, but how to do that is not so obvious.
--
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/
More information about the Digitalmars-d
mailing list