Handling invalid UTF sequences
Nick Sabalausky
SeeWebsiteToContactMe at semitwist.com
Thu Mar 20 15:57:40 PDT 2014
On 3/20/2014 6:39 PM, Walter Bright wrote:
> Currently we do it by throwing a UTFException. This has problems:
>
> 1. about anything that deals with UTF cannot be made nothrow
>
> 2. turns innocuous errors into major problems, such as DOS attack vectors
> http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
>
> One option to fix this is to treat invalid sequences as:
>
> 1. the .init value (0xFF for UTF8, 0xFFFF for UTF16 and UTF32)
>
> 2. U+FFFD
>
> I kinda like option 1.
>
> What do you think?
I'd have to give some thought to have an opinion on the right solution,
however I do want to say the current UTFException throwing is something
I've always been unhappy with. So it definitely should get addressed in
some way.
More information about the Digitalmars-d
mailing list