How to detect start of Unicode symbol and count amount of graphemes
    monarch_dodra via Digitalmars-d-learn 
    digitalmars-d-learn at puremagic.com
       
    Sun Oct  5 04:18:05 PDT 2014
    
    
  
On Sunday, 5 October 2014 at 08:27:58 UTC, Uranuz wrote:
> I have struct StringStream that I use to go through and parse 
> input string. String could be of string, wstring or dstring 
> type. I implement function popChar that reads codeUnit from 
> Stream. I want to have *debug* mode of parser (via CT switch), 
> where I could get information about lineIndex, codeUnitIndex, 
> graphemeIndex. So I don't want to use *front* primitive because 
> it autodecodes everywhere, but I want to get info abot index of 
> *user perceived character* in debug mode (so decoding is needed 
> here).
>
> Question is how to detect that I go from one Unicode grapheme 
> to another when iterating on string, wstring, dstring by code 
> unit? Is it simple or is it attempt to reimplement a big piece 
> of existing std library code?
You can use std.uni.byGrapheme to iterate by graphemes:
http://dlang.org/phobos/std_uni.html#.byGrapheme
AFAIK, graphemes are not "self synchronizing", but codepoints 
are. You can pop code units until you reach the beginning of a 
new codepoint. From there, you can iterate by graphemes, though 
your first grapheme might be off.
    
    
More information about the Digitalmars-d-learn
mailing list