How to detect start of Unicode symbol and count amount of graphemes

Uranuz via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun Oct 5 01:27:55 PDT 2014


I have struct StringStream that I use to go through and parse 
input string. String could be of string, wstring or dstring type. 
I implement function popChar that reads codeUnit from Stream. I 
want to have *debug* mode of parser (via CT switch), where I 
could get information about lineIndex, codeUnitIndex, 
graphemeIndex. So I don't want to use *front* primitive because 
it autodecodes everywhere, but I want to get info abot index of 
*user perceived character* in debug mode (so decoding is needed 
here).

Question is how to detect that I go from one Unicode grapheme to 
another when iterating on string, wstring, dstring by code unit? 
Is it simple or is it attempt to reimplement a big piece of 
existing std library code?

As a result I should just increment internal graphemeIndex.

There short version of implementation that I want follows

struct StringStream(String)
{
    String str;
    size_t index;
    size_t graphemeIndex;

    auto popChar()
    {
       index++;
       if( ??? ) //How to detect new grapheme?
       {
          graphemeIndex++;
       }
       return str[index];
    }

}

Sorry for very simple question. I just have a mess in my head 
about Unicode and D strings


More information about the Digitalmars-d-learn mailing list