First Impressions!

Patrick Schluter Patrick.Schluter at bbox.fr
Fri Dec 1 12:30:56 UTC 2017


On Friday, 1 December 2017 at 12:21:22 UTC, A Guy With a Question 
wrote:
> On Friday, 1 December 2017 at 06:07:07 UTC, Patrick Schluter 
> wrote:
>> On Thursday, 30 November 2017 at 19:37:47 UTC, Steven 
>> Schveighoffer wrote:
>>> On 11/30/17 1:20 PM, Patrick Schluter wrote:
>>>> [...]
>>>
>>> iopipe handles this: 
>>> http://schveiguy.github.io/iopipe/iopipe/textpipe/ensureDecodeable.html
>>>
>>
>> It was only to give an example. With UTF-8 people who 
>> implement the low level code in general think about the 
>> multiple codeunits at the buffer boundary. With UTF-16 it's 
>> often forgotten. In UTF-16 there are also 2 other common 
>> pitfalls, that exist also in UTF-8 but are less consciously 
>> acknowledged, overlong encoding and isolated codepoints. So 
>> UTF-16 has the same issues as UTF-8, plus some more, 
>> endianness and size.
>
> Most problems with UTF16 is applicable to UTF8. The only issue 
> that isn't, is if you are just dealing with ASCII it's a bit of 
> a waste of space.

That's what I said. UTF-16 and UTF-8 have the same issues, but 
UTF-16 has even 2 more: endianness and bloat for ASCII. All 3 
encodings have their pluses and minuses, that's why D supports 
all 3 but with a preference for utf-8.


More information about the Digitalmars-d mailing list