How to detect start of Unicode symbol and count amount of graphemes

Jacob Carlborg via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Oct 5 12:52:39 PDT 2014


On 2014-10-05 14:09, Uranuz wrote:

> Maybe there is some idea how to just detect first code unit of grapheme
> without overhead for using Grapheme struct? I just tried to check if ch
> < 128 (for UTF-8). But this dont work. How to check if byte is
> continuation of code for single code point or if new sequence started?

Have a look here [1]. For example, if you have a byte that is between 
U+0080 and U+07FF you know that you need two bytes to get that whole 
code point.

[1] http://en.wikipedia.org/wiki/UTF-8#Description

-- 
/Jacob Carlborg


More information about the Digitalmars-d-learn mailing list