Going on std.regex & std.uni bug-fixing hunt

Chad Joan via Digitalmars-d digitalmars-d at puremagic.com
Sat Sep 9 17:16:10 PDT 2017

On Tuesday, 5 September 2017 at 10:50:46 UTC, Dmitry Olshansky 
> It's been tough time on D front for me, going down from about 
> ~1 week of activity during July to ~2-3 days in August.
> One thing I realised is that doing a new GC is going to be a 
> long battle.
> Before I'm too deep down this rabbit hole I decided to first 
> address the long-standing backlog of issues of std.regex and 
> std.uni.
> My burndown list for std.regex:
> https://issues.dlang.org/buglist.cgi?bug_status=UNCONFIRMED&bug_status=NEW&bug_status=ASSIGNED&bug_status=REOPENED&bug_status=VERIFIED&component=phobos&list_id=216638&product=D&query_format=advanced&resolution=---&short_desc=regex&short_desc_type=allwordssubstr
> ...

I was working on std.regex a bit myself, so I created this bug 
report to capture some of the findings/progress:

It seems like something you might be interested in, or might even 
have a small chance of fixing in the course of other things.


There are other regex improvements I might be interested in, but 
I'm not sure I have time to make bug reports for them right now.  
I might be convinced to fast track them if someone wants to make 
legitimate effort towards fixing them, otherwise I'll eventually 
get around to writing the reports and/or making PRs someday.


-- Calls to malloc in the CTFE path cause some regexes to fail at 
compile time.  I suspect this happens due to the Captures (n > 
smallString) condition when the number of possible captures is 
greater than 3, but I haven't tested it (time consuming...).

-- I remember being unable to iterate over named captures.  But 
I'm not confident that I'm remembering this correctly, and I'm 
not sure if it's still true.

-- The Captures struct does not specify what value is returned 
for submatches that were in the branch of an alternation that 
wasn't taken or in a repetition that matched 0 or more than 1 

-- The Captures struct does not seem to have a way to access all 
of the strings matched by a submatch in repetition context, not 
to mention nested repetition contexts.

I'm not sure how much those mentions help without proper bug 
reports, but at least I got it off my chest (for the time being) 
without having to spend my whole weekend writing bug reports ;)


Dmitry, I appreciate your working towards making the regex module 
easier to work on.  Thanks.


I'm curious what you're thinking about when you mention something 
ambitious like writing a new GC :)
(like this https://imgur.com/cWa4evD)

I can't help but fantasize about cheap ways to get GC allocations 
to parallelize well and not end up writing an entire generational 
collector!  But I doubt I'll ever have the opportunity to work on 
such things.  I hope your GC attempt works out!

More information about the Digitalmars-d mailing list