Do we need faster regex?

Dmitry Olshansky dmitry.olsh at gmail.com
Mon Dec 18 18:34:51 UTC 2023


On Monday, 18 December 2023 at 17:16:40 UTC, H. S. Teoh wrote:
> On Sun, Dec 17, 2023 at 03:43:22PM +0000, Dmitry Olshansky via 
> Digitalmars-d wrote:
>> So I’ve been working on rewind-regex trying to correct all of 
>> the decisions in the original engine that slowed it down, 
>> dropping some features that I knew I cannot implement 
>> efficiently (backreferences have to go).
>> 
>> So while I’m obsessed with simplicity and speed I thought I’d 
>> ask people if it was an issue and what they really want from 
>> gen2 regex library.
> [...]
>
> What I really want:
>
> - Reduce compile-time cost of `import std.regex;` to zero, or 
> at least
>   close enough it's no longer noticeable.
>
> - Automatic caching of fixed-string regexes, i.e., the 
> equivalent of:
>
> 	struct Re(string ctKnownRe) {
> 		Regex!char re;
> 		shared static this() {
> 			re = regex(ctKnownRe);
> 		}
> 		Regex!char Re() {
> 			return re;
> 		}
> 	}

A runtime cache should work, btw std.regex caches regexes (at 
least those passed as strings to match* family of functions).

>
> 	void main() {
> 		string s;
> 		if (s.matchFirst(Re!`some\+pattern`)) {
> 			...
> 		}
>
> 		// This should reuse the Regex instance from before:
> 		if (s.matchFirst(Re!`some\+pattern`)) {
> 			...
> 		}
> 	}

I'm thinking if it's worth it to intern patterns like that.

> - Reasonably fast runtime performance. I don't really care if 
> it's the
>   top-of-the-line superfast regex matcher, even though that 
> would be
>   really nice.  The primary pain points are the cost of import, 
> and the
>   need to manually write code for automatic caching of fixed 
> runtime
>   regexen.


> - Get rid of ctRegex -- it adds a huge compile-time cost with
>   questionable runtime benefit. Unless there's a way to do this 
> at
>   compile-time that *doesn't* add like 5 seconds per regex to 
> compile
>   times.

Yup it's dropped, to be eventually replaced by JIT which is both 
better at compile-time and much more flexible at run-time.

---
Dmitry Olshansky
CEO @ Glowlabs
https://olshansky.me






More information about the Digitalmars-d mailing list