isAsciiString in Phobos?

monarch_dodra monarchdodra at gmail.com
Mon Oct 7 09:07:22 PDT 2013


On Monday, 7 October 2013 at 15:57:15 UTC, Andrej Mitrovic wrote:
> On 10/7/13, Adam D. Ruppe <destructionator at gmail.com> wrote:
>> If you want strict ASCII, it should be <= 127 rather than 255
>> because the high bit can be all kinds of different encodings 
>> (the
>> first 255 of unicode codepoints I think match latin-1
>> numerically, but that's different than windows-1252 or various
>> non-English extended asciis.)
>>
>> You could also convert utf-8 to ascii.... sort of... by just
>> stripping out any byte > 127 since bytes higher than that are
>> multibyte sequences in utf8.
>
> Thanks. I got some useful info from Jakob from IRC, and ended 
> up with this:
>
> bool isAsciiString(string input)
> {
>     auto data = cast(const(ubyte)[])input;
>     return data.all!(a => a <= 0x7F);
> }
>
> The cast is needed to avoid decoding by the "all" function. Also
> there's isASCII that works on a dchar in std.ascii, but I was 
> looking
> for something that works on entire strings at once. So the above
> function does the work for me.

You can use std.string.representation to do the cast for you, and 
you might as well just use isASCII anyways.

return data.representation().all!isASCII();

If we want even more efficiency, we could iterate on the string, 
interpreting it as a size_t[]. We mask each of its elements with 
0x80808080/0x80808080_80808080, and if one of the resulting 
masked elements is not null, then the string isn't ASCII.


More information about the Digitalmars-d-learn mailing list