How to check i

Ali Çehreli via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Thu Oct 16 14:53:08 PDT 2014


On 10/16/2014 12:43 PM, spir via Digitalmars-d-learn wrote:

> denis

spir is back! :)

On 10/16/2014 11:46 AM, Uranuz wrote:

 > I have some string *str* of unicode characters. The question is how to
 > check if I have valid unicode code point starting at code unit *index*?

It is easy if I understand the question as skipping over invalid UTF-8 
sequences:

import std.stdio;

ubyte upperTwoBits(ubyte b)
{
     return b & 0b1100_0000;
}

bool isUtf8ContinuationByte(char c)
{
     enum utf8ContinuationPrefix = 0b1000_0000;
     return upperTwoBits(c) == utf8ContinuationPrefix;
}

void moveToValid(ref inout(char)[] s)
{
     /* Skip over UTF-8 continuation bytes. */
     while (s.length && isUtf8ContinuationByte(s[0])) {
         s = s[1..$];
     }

     /*
      * The wchar[] overload is too complicated for Ali at this time. :)
      *
      * Please see the following function template in phobos/std/utf.d:
      *
      * private dchar decodeImpl(bool canIndex, S)(...)
      *     if (is(S : const wchar[]) ...
      */
}

unittest
{
     auto s = "çde";
     moveToValid(s);
     assert(s == "çde");

     s = s[1 .. $];
     moveToValid(s);
     assert(s == "de", s);
}

void moveToValid(ref const(dchar)[] s)
{
     /* Every code unit is valid; nothing to do. */
}

void main()
{}

Ali



More information about the Digitalmars-d-learn mailing list