isAsciiString in Phobos?
monarch_dodra
monarchdodra at gmail.com
Mon Oct 7 09:07:22 PDT 2013
On Monday, 7 October 2013 at 15:57:15 UTC, Andrej Mitrovic wrote:
> On 10/7/13, Adam D. Ruppe <destructionator at gmail.com> wrote:
>> If you want strict ASCII, it should be <= 127 rather than 255
>> because the high bit can be all kinds of different encodings
>> (the
>> first 255 of unicode codepoints I think match latin-1
>> numerically, but that's different than windows-1252 or various
>> non-English extended asciis.)
>>
>> You could also convert utf-8 to ascii.... sort of... by just
>> stripping out any byte > 127 since bytes higher than that are
>> multibyte sequences in utf8.
>
> Thanks. I got some useful info from Jakob from IRC, and ended
> up with this:
>
> bool isAsciiString(string input)
> {
> auto data = cast(const(ubyte)[])input;
> return data.all!(a => a <= 0x7F);
> }
>
> The cast is needed to avoid decoding by the "all" function. Also
> there's isASCII that works on a dchar in std.ascii, but I was
> looking
> for something that works on entire strings at once. So the above
> function does the work for me.
You can use std.string.representation to do the cast for you, and
you might as well just use isASCII anyways.
return data.representation().all!isASCII();
If we want even more efficiency, we could iterate on the string,
interpreting it as a size_t[]. We mask each of its elements with
0x80808080/0x80808080_80808080, and if one of the resulting
masked elements is not null, then the string isn't ASCII.
More information about the Digitalmars-d-learn
mailing list