First Impressions!
A Guy With a Question
aguywithanquestion at gmail.com
Thu Nov 30 13:18:37 UTC 2017
On Thursday, 30 November 2017 at 10:19:18 UTC, Walter Bright
wrote:
> On 11/27/2017 7:01 PM, A Guy With an Opinion wrote:
>> +- Unicode support is good. Although I think D's string type
>> should have probably been utf16 by default. Especially
>> considering the utf module states:
>>
>> "UTF character support is restricted to '\u0000' <= character
>> <= '\U0010FFFF'."
>>
>> Seems like the natural fit for me. Plus for the vast majority
>> of use cases I am pretty guaranteed a char = codepoint. Not
>> the biggest issue in the world and maybe I'm just being overly
>> critical here.
>
> Sooner or later your code will exhibit bugs if it assumes that
> char==codepoint with UTF16, because of surrogate pairs.
>
> https://stackoverflow.com/questions/5903008/what-is-a-surrogate-pair-in-java
>
> As far as I can tell, pretty much the only users of UTF16 are
> Windows programs. Everyone else uses UTF8 or UCS32.
>
> I recommend using UTF8.
As long as you understand it's limitations I think most bugs can
be avoided. Where UTF16 breaks down, is pretty well defined.
Also, super rare. I think UTF32 would be great to, but it seems
like just a waste of space 99% of the time. UTF8 isn't horrible,
I am not going to never use D because it uses UTF8 (that would be
silly). Especially when wstring also seems baked into the
language. However, it can complicate code because you pretty much
always have to assume character != codepoint outside of ASCII. I
can see a reasonable person arguing that it forcing you assume
character != code point is actually a good thing. And that is a
valid opinion.
More information about the Digitalmars-d
mailing list