How to detect start of Unicode symbol and count amount of graphemes

Uranuz via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Oct 5 05:09:33 PDT 2014


> You can use std.uni.byGrapheme to iterate by graphemes:
> http://dlang.org/phobos/std_uni.html#.byGrapheme
>
> AFAIK, graphemes are not "self synchronizing", but codepoints 
> are. You can pop code units until you reach the beginning of a 
> new codepoint. From there, you can iterate by graphemes, though 
> your first grapheme might be off.

Maybe there is some idea how to just detect first code unit of 
grapheme without overhead for using Grapheme struct? I just tried 
to check if ch < 128 (for UTF-8). But this dont work. How to 
check if byte is continuation of code for single code point or if 
new sequence started?



More information about the Digitalmars-d-learn mailing list