What's left for 1.0? - string class

Samuel MV samuel at jxdesigner.com
Fri Nov 17 04:02:58 PST 2006


This is *very* serious for i18n:

 >>         char[] foo = "hög";
 >>         assert(foo.length == 3); // Sorry UTF-8, this is == 4
 >>         assert(foo[1] == 'ö');   // Not a chance!

char[] should be a real char[], not a sort of byte[] for text. It needs 
to be fix for non-english.

Best regards,

               Samuel.


Aarti_pl escribió:
> I can not believe no one is using utf-8 characters in his program and is 
> not concerned about issues with current D char[] implementation, so I 
> repost my previous post. Sorry about reposting - if no one will comment 
> I will get a lesson and thing that maybe this issue is not so much 
> important.
> 
> But preferably I will get some even negative comments about importance 
> of having string class built in...
> 
> For me string class is something what could significantly improve 
> quality of libraries for D.
> 
> Best Regards
> Marcin Kuszczak
> 
> 
> Marcin Kuszczak napisał(a):
>> Bill Baxter wrote:
>>
>>> So, what's left on everyone's lists for D1.0 must-have features?
>>
>> I think that one thing which is missed in phobos right now is string 
>> class
>> which encapsulates utf-8/utf-16/utf-32 handling and issues connected with
>> utf-8 strings e.g.:
>>
>>         char[] foo = "hög";
>>         assert(foo.length == 3); // Sorry UTF-8, this is == 4
>>         assert(foo[1] == 'ö');   // Not a chance!
>>
>> It's really annoying to write application in language different than 
>> English
>> (or multi-language application) using just currently available language
>> support (please note that I am not saying that in C++ it's better :-) - I
>> am just saying that it could be greatly improved).
>>
>> Problems which I see with current language support are:
>> 1. You need 3 types of functions which are doing same, but getting 
>> different
>> char arrays - char[]/wchar[]/dchar[] as parameters, to write good API you
>> have to write. Best (maybe I should write worst ;-) ) example is 
>> Phobos API
>> 2. It's quite easy to make wrong assumptions about utf-8 encoded 
>> arrays. See
>> example above. It could cause slicing string in wrong place, making it
>> improperly formatted utf-8 string.
>> 3. What is char[] is probably not so clear for newbies. Especially for
>> utf-8: is char really one character/code point?
>> 4. String class should be more than just array of characters. It should
>> probably have some more methods like e.g. removing characters from middle
>> of string and much more.
>>
>> Some time ago Chris Miller posted on news list dstring implementation 
>> which
>> looks quite good for me(link to site:
>> http://www.dprogramming.com/dstring.php).
>>
>> But if Walter is not happy enough with this implementation now maybe 
>> there
>> should be at least added alias in object.d:
>> alias char[] string;
>>
>> I personally vote for dstring.
>>



More information about the Digitalmars-d mailing list