Accented Characters and Counting Syllables

H. S. Teoh via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Dec 7 07:45:39 PST 2014


On Sun, Dec 07, 2014 at 02:30:13PM +0000, "Nordlöw" via Digitalmars-d-learn wrote:
> On Saturday, 6 December 2014 at 23:11:49 UTC, H. S. Teoh via
> Digitalmars-d-learn wrote:
> >This is a Unicode issue. What you want is neither byCodeUnit nor
> >byCodePoint, but byGrapheme. A grapheme is the Unicode equivalent of
> >what lay people would call a "character". A Unicode character (or
> >more precisely, a "code point") is not necessarily a complete
> >grapheme, as your example above shows; it's just a numerical value
> >that uniquely identifies an entry in the Unicode character database.
> >
> >
> >T
> 
> Ok, thanks.
> 
> I just noticed that byGrapheme() lacks bidirectional access. Further
> it also lacks graphemeStrideBack() in complement to graphemeStride()?
> Similar to stride() and strideBack(). Is this difficult to implement?

Not sure, but I wouldn't be surprised if it is. Unicode algorithms are
generally non-trivial.


T

-- 
Who told you to swim in Crocodile Lake without life insurance??


More information about the Digitalmars-d-learn mailing list