Handling invalid UTF sequences
Walter Bright
newshound2 at digitalmars.com
Thu Mar 20 15:39:50 PDT 2014
Currently we do it by throwing a UTFException. This has problems:
1. about anything that deals with UTF cannot be made nothrow
2. turns innocuous errors into major problems, such as DOS attack vectors
http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences
One option to fix this is to treat invalid sequences as:
1. the .init value (0xFF for UTF8, 0xFFFF for UTF16 and UTF32)
2. U+FFFD
I kinda like option 1.
What do you think?
More information about the Digitalmars-d
mailing list