How to check i
    Ali Çehreli via Digitalmars-d-learn 
    digitalmars-d-learn at puremagic.com
       
    Thu Oct 16 14:53:08 PDT 2014
    
    
  
On 10/16/2014 12:43 PM, spir via Digitalmars-d-learn wrote:
> denis
spir is back! :)
On 10/16/2014 11:46 AM, Uranuz wrote:
 > I have some string *str* of unicode characters. The question is how to
 > check if I have valid unicode code point starting at code unit *index*?
It is easy if I understand the question as skipping over invalid UTF-8 
sequences:
import std.stdio;
ubyte upperTwoBits(ubyte b)
{
     return b & 0b1100_0000;
}
bool isUtf8ContinuationByte(char c)
{
     enum utf8ContinuationPrefix = 0b1000_0000;
     return upperTwoBits(c) == utf8ContinuationPrefix;
}
void moveToValid(ref inout(char)[] s)
{
     /* Skip over UTF-8 continuation bytes. */
     while (s.length && isUtf8ContinuationByte(s[0])) {
         s = s[1..$];
     }
     /*
      * The wchar[] overload is too complicated for Ali at this time. :)
      *
      * Please see the following function template in phobos/std/utf.d:
      *
      * private dchar decodeImpl(bool canIndex, S)(...)
      *     if (is(S : const wchar[]) ...
      */
}
unittest
{
     auto s = "çde";
     moveToValid(s);
     assert(s == "çde");
     s = s[1 .. $];
     moveToValid(s);
     assert(s == "de", s);
}
void moveToValid(ref const(dchar)[] s)
{
     /* Every code unit is valid; nothing to do. */
}
void main()
{}
Ali
    
    
More information about the Digitalmars-d-learn
mailing list