std.xml2 (collecting features) control character

Alex Vincent via Digitalmars-d digitalmars-d at puremagic.com
Thu Feb 18 10:28:10 PST 2016


On Thursday, 18 February 2016 at 17:26:30 UTC, Adam D. Ruppe 
wrote:
> On Thursday, 18 February 2016 at 16:56:08 UTC, Robert burner 
> Schadek wrote:
>>> unix file says it is a utf8 encoded file, but not BOM is 
>>> present.
>>
>> the hex dump is "3C 66 6F 6F 3E C2 80 3C 2F 66 6F 6F 3E"
>
> Gah, I should have read this before replying... well, that does 
> appear to be valid utf-8.... why is it throwing an exception 
> then?
>
> I'm pretty sure that byte stream *is* actually well-formed xml 
> 1.0 and should pass utf validation as well as the XML 
> well-formedness check.

Regarding control characters:  If you give me a complete sample 
file, I can run it through Mozilla's UTF stream conversion and/or 
XML parsing code (via either SAX or DOMParser) to tell you how 
that reacts as a reference.  Mozilla supports XML 1.0, but not 
1.1.


More information about the Digitalmars-d mailing list