First Impressions!

Steven Schveighoffer schveiguy at yahoo.com
Thu Nov 30 19:37:47 UTC 2017


On 11/30/17 1:20 PM, Patrick Schluter wrote:
> On Thursday, 30 November 2017 at 17:40:08 UTC, Jonathan M Davis wrote:
>> English and thus don't as easily hit the cases where their code is 
>> wrong. For better or worse, UTF-16 hides it better than UTF-8, but the 
>> problem exists in both.
>>
> 
> To give just an example of what can go wrong with UTF-16. Reading a file 
> in UTF-16 and converting it tosomething else like UTF-8 or UTF-32. 
> Reading block by block and hitting exactly a SMP codepoint at the buffer 
> limit, high surrogate at the end of the first buffer, low surrogate at 
> the start of the next. If you don't think about it => 2 invalid 
> characters instead of your nice poop 💩 emoji character (emojis are in 
> the SMP and they are more and more frequent).

iopipe handles this: 
http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html

-Steve


More information about the Digitalmars-d mailing list