ctRegex vs. Regex vs. plain string

Chris wendlec at tcd.ie
Thu Dec 6 08:38:37 PST 2012


On Thursday, 6 December 2012 at 16:00:11 UTC, Dmitry Olshansky 
wrote:
> 12/6/2012 7:21 PM, Chris пишет:
>> I have updated my code (finally!) to 2.060.
>
> Congrats!
>
>> As my project deals a lot
>> with text processing including loads of special characters (á, 
>> ú etc.),
>> I make extensive use of the std.regex module (and I really 
>> appreciate
>> the use of the Thompson NFA). To optimize my program I have 
>> experimented
>> with ctRegex / StaticRegex and Regex. However, there are still 
>> compile
>> time problems with Regex and StaticRegex which is why I am 
>> using plain
>> strings at the moment, which work fine with the same regular
>> expressions.
>
> At first I was confused by "make extensive use of the 
> std.regex"  and "using plain strings". But then I recalled the 
> problematic "bug" in how the compiler treats globals.
>
> So if your code goes like this:
>
> //globals or statics
> auto re1 = regex(...);
> auto re2 = regex(...);
> //...
> auto reK = regex(...);
>
> //and e.g. in main:
> void main(){
>  ... use reX etc. ...
> }
>
> Then the long compilations are caused by the compiler doing 
> constant-folding on re1-reK variables. This forces it to parse 
> & compile these patterns at compile-time.
>
> While it's cute and looks like a minor optimization it can make 
> compile times monstrous. Especially as it just produces the 
> same normal pattern that R-T regex uses. The way out is to keep 
> compiled patterns on stack or initialize them inside of static 
> this.
>
> As for using strings as patterns - it does compile them 
> internally and caches the last 8 of them. In other words it 
> should be fine for scripts and programs that use a few patterns 
> to go with plain strings. It doesn't slow things down 
> considerably even in a tight loop.
>
> But once you are going for about 10+ commonly used patterns 
> then precompiling them is a better option.
>
>> Are there any precautions I have to take when using compile
>> time regular expressions?
>
> One precaution is to use ctRegex only when things are well 
> tested and you are ready to go for that extra speed. It 
> typically takes a lot of time and RAM to get it to compile.
>
> Then again testing that results do match is recommended. Simply 
> because of the pressure it puts on the compiler ctRegex is not 
> that well tested (it goes only through a couple of tests in the 
> Phobos unittests)  unlike the regular one.
>
>> Does anyone have any experience as regards
>> performance enhancement?
>>
>
> You tell me ;) As a matter of fact I collect problematic or 
> frequent patterns, guess I need to advertise it somewhere.
>
> Seriously, it depends on patterns and the data. I'd expect 
> about 20-50% faster. But there are even cases where it may slow 
> it down (the C-T backend is not that sophisticated as primary 
> R-T one... something to improve with time).

Thanks a lot. That's very useful information. I will follow the 
rules Roberto Ierusalimschy mentions:

"In Lua, as in any other programming language, we should always 
follow the two maxims of program optimization:

Rule #1: Don’t do it.
Rule #2: Don’t do it yet. (for experts only)"


More information about the Digitalmars-d mailing list