String implementations
Janice Caron
caron800 at googlemail.com
Fri Jan 18 14:08:28 PST 2008
On 1/16/08, Jarrod <qwerty at ytre.wq> wrote:
> On Tue, 15 Jan 2008 21:23:31 -0500, bearophile wrote:
>
> So if this is the case, then why can't the language itself manage multi-
> byte characters for us? It would make things a hell of a lot easier and
> more efficient than having to convert /potentially/ foreign strings to
> utf-32 for a simple manipulation operation, then converting them back.
> The only reason I can think of for char arrays being treated as fixed
> length is for faster indexing, which is hardly useful in most cases since
> a lot of the time we don't even know if we're dealing with multi-byte
> characters when handling strings, so we have to convert and traverse the
> strings anyway.
Because, think about this:
char[] a = new char[8];
If a char array were indexed by character instead of codeunit, as you
suggest, how many bytes would the compiler need to allocate? It can't
know in advance. Also:
char[] a = "abcd";
char[] b = "\u20AC";
a[0] = b[0];
would cause big problems. (Would a[1] get overwritten? Would a have to
be resized and everything shifted up one byte?)
I think D has got it right. Use wchar or dchar when you need character
based indexing.
More information about the Digitalmars-d
mailing list