To Walter, about char[] initialization by FF
Walter Bright
newshound at digitalmars.com
Tue Aug 1 11:20:11 PDT 2006
Andrew Fedoniouk wrote:
> Compiler accepts input stream as either BMP codes or full unicode set
encoded using UTF-16.
BMP is a subset of UTF-16.
> There is no mentioning that String[n] will return you utf-16 code
> unit. That will be weird.
String.charCodeAt() will give you the utf-16 code unit.
>> Conversely, the A functions under NT and later translate the characters
>> to - you guessed it - UTF-16 and then call the corresponding W function.
>> This is why Phobos under NT does not call the A functions.
> Ok. And how do you call A functions?
Take a look at std.file for an example.
>> Windows, Java, and Javascript have all
>> had to go back and redo to deal with surrogate pairs.
> Why? JavaScript for example has no such things as char.
> String.charAt() returns guess what? Correct - String object.
> No char - no problem :D
See String.fromCharCode() and String.charCodeAt()
> Again - let people decide of what char is and how to interpret it And that
> will be it.
I've already explained the problems C/C++ have with that. They're real
problems, bad and unfixable enough that there are official proposals to
add new UTF basic types to to C++.
> Phobos can work with utf-8/16 and satisfy you and other UTF-masochists (no
> offence implied).
C++'s experience with this demonstrates that char* does not work very
well with UTF-8. It's not just my experience, it's why new types were
proposed for C++ (and not by me).
> Ordinary people will do their own strings anyway. Just
> give them opAssign and dtor in structs and you will see explosion of perfect
> strings. That char#[] (read-only arrays) will also benefit here. oh.....
>
> Changing char init value to 0 will not harm anybody but will allow to use
> char for other than
>
> utf-8 purposes - it is only one from 40 in active use encodings anyway.
>
> For persistence purposes (in compiled EXE) utf is the best choice probably.
> But in runtime - please not on language level.
ubyte[] will enable you to use any encoding you wish - and that's what
it's there for.
More information about the Digitalmars-d
mailing list