String implementations
Janice Caron
caron800 at googlemail.com
Sat Jan 19 12:21:49 PST 2008
On 1/19/08, James Dennett <jdennett at acm.org> wrote:
> >>> char[] a = "abcd";
> >>> char[] b = "\u20AC";
> >>> a[0] = b[0];
>
> I think we are *disagreeing* here. I claim that this causes
> problems _with the current design of D_, which would be
> resolved if char[] (or however we denote mutable UTF8 strings)
> string were really a UTF8 type.
So you're saying that in your new design, after that assignment, a
would equal "\u20ACbcd". The problem is that the compiler would have
to allocate extra bytes and then memcpy all the bytes up a bit to make
room. That strikes me as kinda slow, which is not something I'd want
in a char array.
> That's the problem. char[] can hold non-UTF8 strings.
Yes, that is possible. But only in buggy code, of course. That really
raises the question: is it the compiler's job, or the programmer's, to
ensure that the contract is maintained? I don't really have any
problem taking responsibility for maintaining UTF-8 correctness. (It's
not hard).
But if you want to be completely protected from those kinds of errors,
I still don't see the problem with using dchar.
> No, it does not. It's precisely the difference that is why
> D's char[] is a poor man's UTF8 string.
I suppose a library class could be written whose interface behaved
like a dchar array, but whose implementation was UTF-8. But when you
ever use it?
More information about the Digitalmars-d
mailing list