non-utf8-decoding regex (for speed)?

Dmitry Olshansky via Digitalmars-d digitalmars-d at puremagic.com
Wed Apr 6 12:32:56 PDT 2016


On 06-Apr-2016 01:00, Timothee Cour via Digitalmars-d wrote:
> Is there a way to avoid decoding (as utf8) when calling regex' apis?
> or a plan to do so?

Custom alphabets - yes, including ASCII.
>
> use case: speed (no decoding) and avoiding throwing on invalid utf8 sequences

The speed gain for ASCII only vs Unicode with ASCII special case would 
be around 0.5% (the time spent on decoding) as my extensive profiling 
shows. Latest pull for std.regex did exactly that - special path for ASCII.

>
> ideally this should allow:
>
> ---
> auto s = cast(ubyte[])  "abcd"; //potentially not valid utf8 sequence
> auto r = cast(ubyte[])  `^\d`;
> auto m=match(s, r.regex); // right now: regex cannot deduce function
> from argument types !()(ubyte[])
> ---
>


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list