std.xml should just go

Thu Feb 3 21:01:28 PST 2011

On 2011-02-03 22:27:08 -0500, Andrei Alexandrescu 
<SeeWebsiteForEmail at erdani.org> said:

> On 2/3/11 9:11 PM, Walter Bright wrote:
>> Andrei Alexandrescu wrote:
>>> Nobody that I know of. If you want to discuss design here while
>>> working on it, that would be great. I could think of a few high-level
>>> requirements:
>>> 
>>> * works with input ranges so we can plug it in with any source
>> 
>> The difficulty with that is if it's a pure input range, then the output
>> cannot be slices of the input.
> 
> In that case it's fair to require sliceable ranges of characters then, 
> or strings outright. It all boils down to stating one's assumptions and 
> choices. Probably parameterizing on character width would be 
> recommendable anyway.

The problem with parametrizing on the character width is that whether a 
parser parses a UTF-8 document or a UTF-16 document is determined at 
runtime by inspecting the document. How is the user of the parser 
supposed to decide in advance which to instantiate? And how the 
application is supposed to handle slices of different string types 
coming from those different parser instances?

The actual low-level parser could indeed use a different instance 
depending on the text encoding as an optimization, but the end-user API 
should standardize on one string type. Unfortunately, if the XML file 
is not using the same text encoding as your standard string type, then 
you can't use slicing and have to create copies for each and every 
string...

Another option is to use a "smart" string type that can accept strings 
slices of any encoding.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/