non-utf8-decoding regex (for speed)?

Jonathan M Davis via Digitalmars-d digitalmars-d at puremagic.com
Tue Apr 5 18:59:45 PDT 2016


On Tuesday, April 05, 2016 15:00:36 Timothee Cour via Digitalmars-d wrote:
> Is there a way to avoid decoding (as utf8) when calling regex' apis?
> or a plan to do so?
>
> use case: speed (no decoding) and avoiding throwing on invalid utf8
> sequences
>
> ideally this should allow:
>
> ---
> auto s = cast(ubyte[])  "abcd"; //potentially not valid utf8 sequence
> auto r = cast(ubyte[])  `^\d`;
> auto m=match(s, r.regex); // right now: regex cannot deduce function
> from argument types !()(ubyte[])
> ---

As a side note, you can use std.string.representation to convert a string to
an array of ubyte with the correct constness.

There's also std.utf.byCodeUnit.

But it wouldn't surprise me at all of std.regex didn't really support either
of those approaches at this point. I'm not very familiar with std.regex
though.

- Jonathan M Davis



More information about the Digitalmars-d mailing list