What's left for 1.0? - string class
Samuel MV
samuel at jxdesigner.com
Fri Nov 17 04:02:58 PST 2006
This is *very* serious for i18n:
>> char[] foo = "hög";
>> assert(foo.length == 3); // Sorry UTF-8, this is == 4
>> assert(foo[1] == 'ö'); // Not a chance!
char[] should be a real char[], not a sort of byte[] for text. It needs
to be fix for non-english.
Best regards,
Samuel.
Aarti_pl escribió:
> I can not believe no one is using utf-8 characters in his program and is
> not concerned about issues with current D char[] implementation, so I
> repost my previous post. Sorry about reposting - if no one will comment
> I will get a lesson and thing that maybe this issue is not so much
> important.
>
> But preferably I will get some even negative comments about importance
> of having string class built in...
>
> For me string class is something what could significantly improve
> quality of libraries for D.
>
> Best Regards
> Marcin Kuszczak
>
>
> Marcin Kuszczak napisał(a):
>> Bill Baxter wrote:
>>
>>> So, what's left on everyone's lists for D1.0 must-have features?
>>
>> I think that one thing which is missed in phobos right now is string
>> class
>> which encapsulates utf-8/utf-16/utf-32 handling and issues connected with
>> utf-8 strings e.g.:
>>
>> char[] foo = "hög";
>> assert(foo.length == 3); // Sorry UTF-8, this is == 4
>> assert(foo[1] == 'ö'); // Not a chance!
>>
>> It's really annoying to write application in language different than
>> English
>> (or multi-language application) using just currently available language
>> support (please note that I am not saying that in C++ it's better :-) - I
>> am just saying that it could be greatly improved).
>>
>> Problems which I see with current language support are:
>> 1. You need 3 types of functions which are doing same, but getting
>> different
>> char arrays - char[]/wchar[]/dchar[] as parameters, to write good API you
>> have to write. Best (maybe I should write worst ;-) ) example is
>> Phobos API
>> 2. It's quite easy to make wrong assumptions about utf-8 encoded
>> arrays. See
>> example above. It could cause slicing string in wrong place, making it
>> improperly formatted utf-8 string.
>> 3. What is char[] is probably not so clear for newbies. Especially for
>> utf-8: is char really one character/code point?
>> 4. String class should be more than just array of characters. It should
>> probably have some more methods like e.g. removing characters from middle
>> of string and much more.
>>
>> Some time ago Chris Miller posted on news list dstring implementation
>> which
>> looks quite good for me(link to site:
>> http://www.dprogramming.com/dstring.php).
>>
>> But if Walter is not happy enough with this implementation now maybe
>> there
>> should be at least added alias in object.d:
>> alias char[] string;
>>
>> I personally vote for dstring.
>>
More information about the Digitalmars-d
mailing list