First Impressions!

Joakim dlang at joakim.fea.st
Thu Nov 30 10:39:19 UTC 2017


On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright 
wrote:
> On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
>> +- Unicode support is good. Although I think D's string type 
>> should have probably been utf16 by default. Especially 
>> considering the utf module states:
>> 
>> "UTF character support is restricted to '\u0000' <= character 
>> <= '\U0010FFFF'."
>> 
>> Seems like the natural fit for me. Plus for the vast majority 
>> of use cases I am pretty guaranteed a char = codepoint. Not 
>> the biggest issue in the world and maybe I'm just being overly 
>> critical here.
>
> Sooner or later your code will exhibit bugs if it assumes that 
> char==codepoint with UTF16, because of surrogate pairs.
>
> https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java
>
> As far as I can tell, pretty much the only users of UTF16 are 
> Windows programs. Everyone else uses UTF8 or UCS32.
>
> I recommend using UTF8.

Java, .NET, Qt, Javascript, and a handful of others use UTF-16 
too, some starting off with the earlier UCS-2:

https://en.m.wikipedia.org/wiki/UTF-16#Usage

Not saying either is better, each has their flaws, just pointing 
out it's more than just Windows.


More information about the Digitalmars-d mailing list