isAsciiString in Phobos?

monarch_dodra monarchdodra at gmail.com
Mon Oct 7 13:14:05 PDT 2013


On Monday, 7 October 2013 at 16:23:12 UTC, Andrej Mitrovic wrote:
> On 10/7/13, monarch_dodra <monarchdodra at gmail.com> wrote:
>> If we want even more efficiency, we could iterate on the 
>> string,
>> interpreting it as a size_t[]. We mask each of its elements 
>> with
>> 0x80808080/0x80808080_80808080, and if one of the resulting
>> masked elements is not null, then the string isn't ASCII.
>
> Clever! So I think we should definitely try and push it to the 
> library.

I wrote this:
Only lightly tested.

//--------
bool isASCII(const(char[]) str)
{
     static if (size_t.sizeof == 8)
     {
         enum size = 8;
         enum size_t mask  = 0x80808080_80808080;
         enum size_t alignMask = ~cast(size_t)0b111;
     }
     else
     {
         enum size = 4;
         enum size_t mask = 0x80808080;
         enum size_t alignMask = ~cast(size_t)0b11;
     }

     if (str.length < size)
     {
         foreach (c; str)
             if (c & 0x80)
                 return false;
         return true;
     }

     immutable start = (cast(size_t)str.ptr & alignMask) + size;
     immutable end = cast(size_t)(str.ptr + str.length) & 
alignMask;

     //we start with block, because it is faster
     //and chances the start is aligned anyways (so we check it 
later).
     for ( auto p = cast(size_t*)start ; p != cast(size_t*)end ; 
++p )
         if (*p & mask)
             return false;

     //Then the trailing chars.
     for ( auto p = cast(char*)end ; p != str.ptr + str.length ; 
++p )
         if (*p & 0x80)
             return false;

     //Finally, the first chars.
     for ( auto p = str.ptr ; p != cast(char*)start ; ++p )
         if (*p & 0x80)
             return false;

     return true;
}
//--------
     assert( "hello".isASCII());
     assert( "heellohelloellohelloellohelloellohellollohello");
     assert( "hellellohelloellohelloo"[3 .. $].isASCII());
     
assert(!"heéppellohelloellohelloellohelloellohelloellohellollo".isASCII());
     
assert(!"heppellohelloellohelloellohéelloellohelloellohellollo".isASCII());
     
assert(!"heppellohelloellohelloellohelloellohelloellohellolléo".isASCII());
//--------

What do you think? I have some doubts though:
1. Does x64 require qword alignment for size_t, or is dword 
enough?
2. Isn't there some built-in that'll give me the wanted 
alignement, isntead of doing it by hand?
3. Are those casts 100% correct?


More information about the Digitalmars-d-learn mailing list