Wide characters support in D

Nick Sabalausky a at a.a
Tue Jun 8 01:09:26 PDT 2010


"Ruslan Nikolaev" <nruslan_devel at yahoo.com> wrote in message 
news:mailman.128.1275979841.24349.digitalmars-d at puremagic.com...
>>
>> Secondly, Java and Windows adapted 16-bit encodings back
>> when many people
>> were still under the mistaken impression that would allow
>> them to hold any
>> character in one code-unit. If that had been true, then it
>
> I doubt that it was the only reason. UTF-8 was already available before 
> Windows NT was released. It would be much easier to use UTF-8 instead of 
> ANSI as opposed to creating parallel API. Nonetheless, UTF-16 has been 
> chosen.
>

I didn't say that was the only reason. Also, you've misunderstood my point:

Their reasoning at the time:
    8-bit: Multiple code-units for some characters
    16-bit: One code-unit per character
    Therefore, use 16-bit.

Reality:
    8-bit: Multiple code-units for some characters
    16-bit: Multiple code-units for some characters
    Therefore, old reasoning not necessarily still applicable.

> In addition, C# has been released already when UTF-16 became variable 
> length.

Right, like I said, C#/.NET use UTF-16 because that's what MS had already 
standardized on.

>I doubt that conversion overhead (which is small compared to VM) was the 
>main reason to preserve UTF-16.

I never said anything about conversion overhead being a reason to preserve 
UTF-16.

>
> Concerning why I say that it's good to have conversion to UTF-32 (you 
> asked somewhere):
>
> I think you did not understand correctly what I meant. This a very common 
> practice, and in fact - required, to convert from both UTF-8 and UTF-16 to 
> UTF-32 when you need to do character analysis (e.g. mbtowc() in C). In 
> fact, it is the only place where UTF-32 is commonly used and useful.
>

I'm well aware why UTF-32 is useful. Earlier, you had started out saying 
that there should only be one string type, the OS-native type. Now you're 
changing your tune and saying that we do need multiple types.




More information about the Digitalmars-d mailing list