First Impressions!

Walter Bright newshound2 at digitalmars.com
Fri Dec 1 23:04:44 UTC 2017


On 11/30/2017 9:23 AM, Kagamin wrote:
> On Tuesday, 28 November 2017 at 03:37:26 UTC, rikki cattermole wrote:
>> Be aware Microsoft is alone in thinking that UTF-16 was awesome. Everybody 
>> else standardized on UTF-8 for Unicode.
> 
> UCS2 was awesome. UTF-16 is used by Java, JavaScript, Objective-C, Swift, Dart 
> and ms tech, which is 28% of tiobe index.

"was" :-) Those are pretty much pre-surrogate pair designs, or based on them 
(Dart compiles to JavaScript, for example).

UCS2 has serious problems:

1. Most strings are in ascii, meaning UCS2 doubles memory consumption. Strings 
in the executable file are twice the size.

2. The code doesn't work well with C. C doesn't even have a UCS2 type.

3. There's no reasonable way to audit the code to see if it handles surrogate 
pairs correctly. Surrogate pairs occur only rarely, so the code is never tested 
for it, and the bugs may remain latent for many, many years.

With UTF8, multibyte code points are much more common, so bugs are detected much 
earlier.


More information about the Digitalmars-d mailing list