Going on std.regex & std.uni bug-fixing hunt

Sun Sep 10 11:54:21 PDT 2017

On Sunday, 10 September 2017 at 11:47:02 UTC, Dmitry Olshansky 
wrote:
> On Sunday, 10 September 2017 at 00:16:10 UTC, Chad Joan wrote:
>> I was working on std.regex a bit myself, so I created this bug 
>> report to capture some of the findings/progress:
>> https://issues.dlang.org/show_bug.cgi?id=17820
>>
>> It seems like something you might be interested in, or might 
>> even have a small chance of fixing in the course of other 
>> things.
>
> Yeah, well known problem. Solution is to keep a bit of memory 
> cached eg  in TLS variable.
>

Indeed.

Is there another issue I can mark it as a duplicate of?

>>
>> [...]
>> -- The Captures struct does not specify what value is returned 
>> for submatches that were in the branch of an alternation that 
>> wasn't taken or in a repetition that matched 0 or more than 1 
>> times.
>
> As every engine out there the value is "", empty string.
>

I usually don't refer to other libraries while using a library.  
If an API doesn't define something, then it is, by definition, 
undefined behavior, and thus quite undesirable to rely upon.

This one seems pretty easy to fix though.  I will probably make a 
documentation PR at some point.

>>
>> -- The Captures struct does not seem to have a way to access 
>> all of the strings matched by a submatch in repetition 
>> context, not to mention nested repetition contexts.
>>
>
> Just like any other regex library.
>

Counterexample: 
https://msdn.microsoft.com/en-us/library/system.text.regularexpressions.group.captures(v=vs.110).aspx#code-snippet-3

I actually have a strong interest in this.  And not because I 
need to write regular expressions that extract lists of patterns 
all the time (well, it might've happened).  More importantly: 
this would make it easier to integrate Phobos' regex engine into 
a parser generator framework.  Current plans involve regular 
expression + parsing expression grammars.  I'm pretty sure it is 
possible to mechanically convert a subset of PEGs into Regexes 
and gain some useful optimizations, but this requires granular 
control over regular expression captures to be able to extract 
the text matched by the original PEG symbols.

>>
>> I'm not sure how much those mentions help without proper bug 
>> reports, but at least I got it off my chest (for the time 
>> being) without having to spend my whole weekend writing bug 
>> reports ;)
>>
>
> Well they are warmly welcome shouldypu get to it.
>

Thanks!

>> ...
>>
>> Dmitry, I appreciate your working towards making the regex 
>> module easier to work on.  Thanks.
>>
>> ...
>>
>> I'm curious what you're thinking about when you mention 
>> something ambitious like writing a new GC :)
>> (like this https://imgur.com/cWa4evD)
>>
>> I can't help but fantasize about cheap ways to get GC 
>> allocations to parallelize well and not end up writing an 
>> entire generational collector!
>
> ThreadCache can go a long way to help that.
>

Google didn't help me with this one.  Any chance I could get a 
link?

>> But I doubt I'll ever have the opportunity to work on such 
>> things.  I hope your GC attempt works out!
>
> Me too. It's won't be trivial effort though.

Good luck!