String implementations

Fri Jan 18 14:08:28 PST 2008

On 1/16/08, Jarrod <qwerty at ytre.wq> wrote:
> On Tue, 15 Jan 2008 21:23:31 -0500, bearophile wrote:
>
> So if this is the case, then why can't the language itself manage multi-
> byte characters for us? It would make things a hell of a lot easier and
> more efficient than having to convert /potentially/ foreign strings to
> utf-32 for a simple manipulation operation, then converting them back.
> The only reason I can think of for char arrays being treated as fixed
> length is for faster indexing, which is hardly useful in most cases since
> a lot of the time we don't even know if we're dealing with multi-byte
> characters when handling strings, so we have to convert and traverse the
> strings anyway.

Because, think about this:

    char[] a = new char[8];

If a char array were indexed by character instead of codeunit, as you
suggest, how many bytes would the compiler need to allocate? It can't
know in advance. Also:

    char[] a = "abcd";
    char[] b = "\u20AC";
    a[0] = b[0];

would cause big problems. (Would a[1] get overwritten? Would a have to
be resized and everything shifted up one byte?)

I think D has got it right. Use wchar or dchar when you need character
based indexing.