Going on std.regex & std.uni bug-fixing hunt
Dmitry Olshansky via Digitalmars-d
digitalmars-d at puremagic.com
Sun Sep 10 04:47:02 PDT 2017
On Sunday, 10 September 2017 at 00:16:10 UTC, Chad Joan wrote:
> On Tuesday, 5 September 2017 at 10:50:46 UTC, Dmitry Olshansky
> wrote:
>> My burndown list for std.regex:
>>
>> https://issues.dlang.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&component=phobos&list_id=216638&product=D&query_format=advanced&resolution=---&short_desc=regex&short_desc_type=allwordssubstr
>> ...
>
> I was working on std.regex a bit myself, so I created this bug
> report to capture some of the findings/progress:
> https://issues.dlang.org/show_bug.cgi?id=17820
>
> It seems like something you might be interested in, or might
> even have a small chance of fixing in the course of other
> things.
Yeah, well known problem. Solution is to keep a bit of memory
cached eg in TLS variable.
>
> ...
>
> There are other regex improvements I might be interested in,
> but I'm not sure I have time to make bug reports for them right
> now. I might be convinced to fast track them if someone wants
> to make legitimate effort towards fixing them, otherwise I'll
> eventually get around to writing the reports and/or making PRs
> someday.
>
> Examples:
>
> -- Calls to malloc in the CTFE path cause some regexes to fail
> at compile time. I suspect this happens due to the Captures (n
> > smallString) condition when the number of possible captures
> is greater than 3, but I haven't tested it (time consuming...).
>
Sholudn't be a problem, but please report an example.
> -- I remember being unable to iterate over named captures. But
> I'm not confident that I'm remembering this correctly, and I'm
> not sure if it's still true.
>
Would be nice and simple enhancement.
> -- The Captures struct does not specify what value is returned
> for submatches that were in the branch of an alternation that
> wasn't taken or in a repetition that matched 0 or more than 1
> times.
As every engine out there the value is "", empty string.
>
> -- The Captures struct does not seem to have a way to access
> all of the strings matched by a submatch in repetition context,
> not to mention nested repetition contexts.
>
Just like any other regex library.
>
> I'm not sure how much those mentions help without proper bug
> reports, but at least I got it off my chest (for the time
> being) without having to spend my whole weekend writing bug
> reports ;)
>
Well they are warmly welcome shouldypu get to it.
> ...
>
> Dmitry, I appreciate your working towards making the regex
> module easier to work on. Thanks.
>
> ...
>
> I'm curious what you're thinking about when you mention
> something ambitious like writing a new GC :)
> (like this https://imgur.com/cWa4evD)
>
> I can't help but fantasize about cheap ways to get GC
> allocations to parallelize well and not end up writing an
> entire generational collector!
ThreadCache can go a long way to help that.
> But I doubt I'll ever have the opportunity to work on such
> things. I hope your GC attempt works out!
Me too. It's won't be trivial effort though.
More information about the Digitalmars-d
mailing list