String implementations

Sat Jan 19 23:07:55 PST 2008

On 1/20/08, Jarrod <qwerty at ytre.wq> wrote:
> If you were writing something that took a text input much like this very
> window I'm typing in right now, and the user hit back a few times and
> input a multi-byte character, how would you deal with it?

I'd write a class, of course.

It is simple (though not trivial) to step through the bytes of UTF-8.
Bytes in the range 00 to 7F are ASCII; bytes in the range 80 to BF are
tail bytes; bytes in the range C0 to F7 are head bytes; and bytes in
the range F8 to FF are illegal. Identifying multi-byte sequences is
therefore easy.

You can make an argument that functions and/or classes to do this sort
of thing should perhaps pre-exist in Phobos, but to say it should be
built into /the language itself/ ... that's going a bit too far, I
feel.

 Allow it to
> overlap? No. dchars? That's a lot of wasted memory, and it basically
> makes me wonder why utf-8 even exists if it needs to be dropped for
> simple text manipulation. May as well stick with utf-32 and ascii.
>
> No sir, I don't like it.
>