Formal Review of std.regex (FReD)

Dmitry Olshansky dmitry.olsh at gmail.com
Sun Oct 23 08:46:50 PDT 2011


On 23.10.2011 11:28, Rainer Schuetze wrote:
>
>
> On 22.10.2011 21:05, Dmitry Olshansky wrote:
>> On 22.10.2011 20:56, Rainer Schuetze wrote:
>>> I haven't followed the discussion closely, and I cannot really comment
>>> on the core regex functionality, but I did actually use FReD as a
>>> replacement of a buggy std.regex once.
>>>
>>> In that case I wanted to have a lazily created static regex, but I did
>>> not find an official way to test whether a Regex has been initialized:
>>>
>>> static Regex!char re;
>>> if(!isInitializedRE(re))
>>> re = regex(r"^(.*)\(([0-9]+)\):(.*)$");
>>>
>>> So I implemented isInitializedRE() as "re.ir !is null" for std.regex and
>>> "re.captures() > 0" for fred, but that fails for being a "drop-in
>>> replacement".
>>
>> Coincidentally, you still can access re.ir property in this way.
>> Wow, I wonder how far with backwards compatibility I can go :)
>>
>> In both cases this relies on undocumented features.
>> Even now I can suggest a more portable and entirely generic way:
>>
>> if(re == Regex!(char).init)
>> {
>> //create re
>> }
>>
>> Though that risks doing more work then needed.
>>
>>>
>>> I think, both versions use implementation specifics, maybe there should
>>> be a documented way to test for being initialized.
>>>
>>
>> Definitely. How about adding an empty property + opCast to bool, with
>> that you'd get:
>> if(!re)
>> {
>> //create re
>> }
>>
>> and a bit more verbose:
>> if(re.empty)
>> {
>> //create re
>> }
>
> I think, this might be confused with normal usage, like "is this regex
> the empty string?" (Is "" a valid regex?). Maybe a more explicite
> "valid()" predicate would be fine.

"" is a valid regex that matches anywhere, with global flag it will 
match before any codepoint + once at end.
I'm not sure using 'valid' is good, it may mislead user to check it all 
over the place e.g.:
auto r = regex("blah");
if(r.valid())
...


>
>>
>>> I also noticed, that "auto match(R, RegEx)(R input, RegEx re);" appears
>>> twice in the documentation, same for "bmatch". I guess they should not
>>> appear together with the string versions.
>>>
>>
>> I gather that happens because there is another overload specifically for
>> C-T regexes. It's docs state just that, but lacking the template
>> constraint signatures are the same, so it indeed can cause some
>> confusion.
>> Maybe it would be better to just combine docs together, and leave one
>> overload undocumented.
>>
>
> As RegEx is a template argument here, it can stand for both Regex and
> StaticRegex, and that should be mentioned. Whether it has two different
> implementations is an implementation detail that does not need to bother
> the user.

OK, will do.

>
> If you want to keep the second entries, I'd recommend renaming the
> argument to StaticRegEx.


-- 
Dmitry Olshansky


More information about the Digitalmars-d mailing list