Should this work?

Thu Jan 9 12:51:36 PST 2014

Marco Leise <Marco.Leise at gmx.de> writes:

> Am Thu, 09 Jan 2014 15:20:13 +0000
> schrieb "John Colvin" <john.loughran.colvin at gmail.com>:
>

> The point about graphemes is good. D's functions still stop
> mid-way. From UTF-8 you can iterate UTF-32 code points, but
> grapheme clusters are the new characters. I.e. the basic need
> to iterate Unicode _characters_ is not supported!
> I cannot even come up with use cases for working with code
> points and think they are a conceptual black hole. Something
> carried over from a time when grapheme clusters didn't exist.

Actually, you can do tons of NLP without grapheme clusters.  If you're
paranoid, you standardize on a specific Unicode normalization first.

You can probably get a bit better results by paying attention to
clusters, but I suspect it will be a marginal improvement.

That said, I do agree with the OP that the string API is currently more
complex to understand than I'd like.  However, it's significantly easier
to use than what's in standard C++ for anything beyond ascii.

Jerry