Unicode handling comparison

Dmitry Olshansky dmitry.olsh at gmail.com
Wed Nov 27 12:28:44 PST 2013


27-Nov-2013 20:22, Wyatt пишет:
> On Wednesday, 27 November 2013 at 16:18:34 UTC, Wyatt wrote:
>>
>> trouble following all that (e.g. Isn't "noe\u0308l" a grapheme
>>
> Whoops, overzealous pasting.  That is, "e\u0308", which composes to
> "ë".  A grapheme cluster seems to represent one printed character: "...a
> horizontally segmentable unit of text, consisting of some grapheme base
> (which may consist of a Korean syllable) together with any number of
> nonspacing marks applied to it."
>
> Is that about right?

As much as standard defines it. (actually they talk about boundaries, 
and grapheme is what happens to be in between).


More specifically D's std.uni follows the notion of the extended 
grapheme cluster. There is no need to stick with ugly legacy crap.

See also
http://www.unicode.org/reports/tr29/
>
> -Wyatt


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list