Going on std.regex & std.uni bug-fixing hunt

Dmitry Olshansky via Digitalmars-d digitalmars-d at puremagic.com
Sun Sep 10 04:47:02 PDT 2017


On Sunday, 10 September 2017 at 00:16:10 UTC, Chad Joan wrote:
> On Tuesday, 5 September 2017 at 10:50:46 UTC, Dmitry Olshansky 
> wrote:
>> My burndown list for std.regex:
>>
>> https://issues.dlang.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&component=phobos&list_id=216638&product=D&query_format=advanced&resolution=---&short_desc=regex&short_desc_type=allwordssubstr
>> ...
>
> I was working on std.regex a bit myself, so I created this bug 
> report to capture some of the findings/progress:
> https://issues.dlang.org/show_bug.cgi?id=17820
>
> It seems like something you might be interested in, or might 
> even have a small chance of fixing in the course of other 
> things.

Yeah, well known problem. Solution is to keep a bit of memory 
cached eg  in TLS variable.

>
> ...
>
> There are other regex improvements I might be interested in, 
> but I'm not sure I have time to make bug reports for them right 
> now.  I might be convinced to fast track them if someone wants 
> to make legitimate effort towards fixing them, otherwise I'll 
> eventually get around to writing the reports and/or making PRs 
> someday.
>
> Examples:
>
> -- Calls to malloc in the CTFE path cause some regexes to fail 
> at compile time.  I suspect this happens due to the Captures (n
> > smallString) condition when the number of possible captures
> is greater than 3, but I haven't tested it (time consuming...).
>

Sholudn't be a problem, but please report an example.

> -- I remember being unable to iterate over named captures.  But 
> I'm not confident that I'm remembering this correctly, and I'm 
> not sure if it's still true.
>

Would be nice and simple enhancement.

> -- The Captures struct does not specify what value is returned 
> for submatches that were in the branch of an alternation that 
> wasn't taken or in a repetition that matched 0 or more than 1 
> times.

As every engine out there the value is "", empty string.

>
> -- The Captures struct does not seem to have a way to access 
> all of the strings matched by a submatch in repetition context, 
> not to mention nested repetition contexts.
>

Just like any other regex library.

>
> I'm not sure how much those mentions help without proper bug 
> reports, but at least I got it off my chest (for the time 
> being) without having to spend my whole weekend writing bug 
> reports ;)
>

Well they are warmly welcome shouldypu get to it.

> ...
>
> Dmitry, I appreciate your working towards making the regex 
> module easier to work on.  Thanks.
>
> ...
>
> I'm curious what you're thinking about when you mention 
> something ambitious like writing a new GC :)
> (like this https://imgur.com/cWa4evD)
>
> I can't help but fantasize about cheap ways to get GC 
> allocations to parallelize well and not end up writing an 
> entire generational collector!

ThreadCache can go a long way to help that.

> But I doubt I'll ever have the opportunity to work on such 
> things.  I hope your GC attempt works out!

Me too. It's won't be trivial effort though.




More information about the Digitalmars-d mailing list