First Impressions

Fri Sep 29 19:52:49 PDT 2006

Chad J > wrote:
> But this is what I'm talking about... you can't slice them or index 
> them.  I might actually index a character out of an array from time to 
> time.  If I don't know about UTF, and I do just keep on coding, and I do 
> something like this:
> 
> char[] str = "some string in nonenglish text";
> for ( int i = 0; i < str.length; i++ )
> {
>   str[i] = doSomething( str[i] );
> }
> 
> and this will fail right?
> 
> If it does fail, then everything is not alright.  You do have to worry 
> about UTF.  Someone has to tell you to use a foreach there.

Yes, you do have to be aware of it being UTF, just like in C you have to 
be aware that strings are 0 terminated. But once aware of it, there is 
plenty of support for it in the core language and in std.utf.

You can also simply use dchar[], which has a one to one mapping between 
characters and indices, if you prefer.

Contrast that with C++, which has no usable or portable support for 
UTF-8, UTF-16, or any Unicode. All your carefully coded use of 
std::string needs to be totally scrapped and redone with your own custom 
classes, should you decide your app needs to support unicode.

You can also wrap char[] inside a class that provides a view of the data 
  as if it were dchar's. But I don't think the performance of such a 
class would be competitive. Interestingly, it turns out that most string 
operations do not need to be concerned with the number of char's in a 
character (like "find this substring"), and forcing them to care just 
makes for inefficiency.