To Walter, about char[] initialization by FF

Walter Bright newshound at digitalmars.com
Tue Aug 1 11:20:11 PDT 2006


Andrew Fedoniouk wrote:
 > Compiler accepts input stream as either BMP codes or full unicode set 
encoded using UTF-16.

BMP is a subset of UTF-16.

 > There is no mentioning that String[n] will return you utf-16 code
 > unit. That will be weird.

String.charCodeAt() will give you the utf-16 code unit.

>> Conversely, the A functions under NT and later translate the characters 
>> to - you guessed it - UTF-16 and then call the corresponding W function. 
>> This is why Phobos under NT does not call the A functions.
> Ok. And how do you call A functions?

Take a look at std.file for an example.


>> Windows, Java, and Javascript have all 
>> had to go back and redo to deal with surrogate pairs.
> Why? JavaScript for example has no such things as char.
> String.charAt() returns guess what? Correct - String object.
> No char - no problem :D

See String.fromCharCode() and String.charCodeAt()

> Again - let people decide of what char is and how to interpret it And that 
> will be it.

I've already explained the problems C/C++ have with that. They're real 
problems, bad and unfixable enough that there are official proposals to 
add new UTF basic types to to C++.

> Phobos can work with utf-8/16 and satisfy you and other UTF-masochists (no 
> offence implied).

C++'s experience with this demonstrates that char* does not work very 
well with UTF-8. It's not just my experience, it's why new types were 
proposed for C++ (and not by me).

> Ordinary people will do their own strings anyway. Just 
> give them opAssign and dtor in structs and you will see explosion of perfect 
> strings. That char#[] (read-only arrays) will also benefit here. oh.....
> 
> Changing char init value to 0 will not harm anybody but will allow to use 
> char for other than
> 
> utf-8 purposes - it is only one from 40 in active use encodings anyway.
> 
> For persistence purposes (in compiled EXE) utf is the best choice probably. 
> But in runtime - please not on language level.

ubyte[] will enable you to use any encoding you wish - and that's what 
it's there for.



More information about the Digitalmars-d mailing list