First Impressions
Walter Bright
newshound at digitalmars.com
Fri Sep 29 19:52:49 PDT 2006
Chad J > wrote:
> But this is what I'm talking about... you can't slice them or index
> them. I might actually index a character out of an array from time to
> time. If I don't know about UTF, and I do just keep on coding, and I do
> something like this:
>
> char[] str = "some string in nonenglish text";
> for ( int i = 0; i < str.length; i++ )
> {
> str[i] = doSomething( str[i] );
> }
>
> and this will fail right?
>
> If it does fail, then everything is not alright. You do have to worry
> about UTF. Someone has to tell you to use a foreach there.
Yes, you do have to be aware of it being UTF, just like in C you have to
be aware that strings are 0 terminated. But once aware of it, there is
plenty of support for it in the core language and in std.utf.
You can also simply use dchar[], which has a one to one mapping between
characters and indices, if you prefer.
Contrast that with C++, which has no usable or portable support for
UTF-8, UTF-16, or any Unicode. All your carefully coded use of
std::string needs to be totally scrapped and redone with your own custom
classes, should you decide your app needs to support unicode.
You can also wrap char[] inside a class that provides a view of the data
as if it were dchar's. But I don't think the performance of such a
class would be competitive. Interestingly, it turns out that most string
operations do not need to be concerned with the number of char's in a
character (like "find this substring"), and forcing them to care just
makes for inefficiency.
More information about the Digitalmars-d
mailing list