toString issue
Hasan Aljudy
hasan.aljudy at gmail.com
Mon Oct 2 06:33:42 PDT 2006
Derek Parnell wrote:
> On Mon, 02 Oct 2006 00:52:44 -0600, Hasan Aljudy wrote:
>
>> Sean Kelly wrote:
>>> How about toUtf8() for classes and structs :-)
>>>
>>> Sean
>> I think there's a fundamental problem with the way D deals with strings.
>> The spec claims that D natively supports strings through char[], at the
>> same time, claims that D fully supports Unicode.
>> The fundamental issue is that UTF-8 is one encoding for Unicode strings,
>> but it's not always the best choice. Phobos mostly only deals with
>> char[], and mixing code that uses wchar[] with code that uses char[]
>> isn't very straight forward.
>>
>> Consider the simple case of reading a text file and detecting "words".
>> To detect a word, you must first recognize letters, no .. not English
>> letters; letters of any language, and for that purpose, we have
>> isUniAlpha function. Now, If you encode the string as char[], then how
>> are you gonna determine whether or not the next character is a Unicode
>> alpha or not?
>>
>> The following definitely shouldn't work:
>> //assuming text is char[]
>> for( int i = 0; i < text.length; i++ )
>> {
>> bool isLetter = isUniAlpha( text[i] );
>> ....
>> }
>
> foreach(int i, dchar c; text)
> {
> bool isLetter = isUniAlpha( c );
> ...
> }
>
>
I know, but that's still a work-around. What if you need to iterate back
and forth? You're gonna need to convert it to dchar[] (or wchar[]).
However, that brings up a good point:
Notice how foreach allows to iterate a string by Unicode characters
(a.k.a code-points)? Shouldn't this kind of iteration be supported
outside of foreach as well?
Sure I know, you can write you're own String class and even an iterator,
but that just proves that string support isn't really/fully built-in.
More information about the Digitalmars-d
mailing list