std.xml should just go

Thu Feb 3 22:20:09 PST 2011

On 2/3/11 11:01 PM, Michel Fortin wrote:
> On 2011-02-03 22:27:08 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> On 2/3/11 9:11 PM, Walter Bright wrote:
>>> Andrei Alexandrescu wrote:
>>>> Nobody that I know of. If you want to discuss design here while
>>>> working on it, that would be great. I could think of a few high-level
>>>> requirements:
>>>>
>>>> * works with input ranges so we can plug it in with any source
>>>
>>> The difficulty with that is if it's a pure input range, then the output
>>> cannot be slices of the input.
>>
>> In that case it's fair to require sliceable ranges of characters then,
>> or strings outright. It all boils down to stating one's assumptions
>> and choices. Probably parameterizing on character width would be
>> recommendable anyway.
>
> The problem with parametrizing on the character width is that whether a
> parser parses a UTF-8 document or a UTF-16 document is determined at
> runtime by inspecting the document. How is the user of the parser
> supposed to decide in advance which to instantiate? And how the
> application is supposed to handle slices of different string types
> coming from those different parser instances?

In that case you'd want to store one specific format and convert to it 
in your I/O routine. Possibly you'd allow the user to choose the 
encoding format that best suits them.

> The actual low-level parser could indeed use a different instance
> depending on the text encoding as an optimization, but the end-user API
> should standardize on one string type. Unfortunately, if the XML file is
> not using the same text encoding as your standard string type, then you
> can't use slicing and have to create copies for each and every string...
>
> Another option is to use a "smart" string type that can accept strings
> slices of any encoding.

All good points.

Andrei