How to detect start of Unicode symbol and count amount of graphemes

Nicolas F. via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Mon Oct 6 05:07:19 PDT 2014


Unicode is hard to deal with properly as how you deal with it is
very context dependant.

One grapheme is a visible character and consists of one or more
codepoints. One codepoint is one mapping of a byte sequence to a
meaning, and consists of one or more bytes.

This you do not want to deal with yourself, as knowing which
codepoints form graphemes is hard. Thankfully, std.uni exists.
Specifically, look at decodeGrapheme: it pops one grapheme from
an input range and returns it.

Never write code that deals with unicode on a bytelevel. It will
always be wrong.


More information about the Digitalmars-d-learn mailing list