Is str ~ regex the root of all evil, or the leaf of all good?

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Feb 19 06:45:58 PST 2009


Michel Fortin wrote:
> On 2009-02-19 00:35:20 -0500, Andrei Alexandrescu 
> <SeeWebsiteForEmail at erdani.org> said:
> 
>> auto s = sub("abracazoo", regex("a([b-e])", "g"), "A$1");
> 
> I don't like `sub`, I mean the name. Makes me think of substring more 
> than substitute. My choice would be to reuse what we have in std.string 
> and augment it to work with regular expressions:
> 
>     auto s = replace("abracazoo", regex("a([b-e])", "g"), subex("A$1"));

Ok. Probably subex is a bit of a killer, but I see your point (subex is 
not an arbitrary string).

> This way it works consistently whether you're using a string or a 
> regular expression: just replace any pattern string with regex(...) and 
> any replacement string with subex(...) -- "substition-expression" -- 
> when you want them to be parsed as such. Omitting subex in the above 
> would make it a plain string replacement for instance (this way it's 
> easy to place use a variable there).

Indeed, that was part of the impetus for making regex a distinct type 
that participates in larger functions. The only problem is that regex 
does not work with std.algorithm in an obvious way, e.g. find() works 
very differently for strings and regexes. I considered at a point trying 
to integrate them, but decided to not spend that effort right now.

> These functions should allow easy substitution of any string or regex 
> pattern with another algorithm for matching the pattern.
> 
> And there's not way to get a range of matches using std.string, but 
> there should be, and it should follow the same rule as above: supporting 
> strings and regex consistently. (Using the `in` operator as suggested by 
> Bill Baxter seems a good fit for this function.)

I defined the following in std.algorithm (signatures simplified):

// Split a range by a 1-element separator
Splitter!(...) splitter(Range, Element)(Range input, Range separator);
// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Range separator);

I then defined this in std.regex:

// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Regex separator);

Now this is very nice because you get to switch from one to another very 
easily.

foreach (e; splitter(input, ',')) { ... }
foreach (e; splitter(input, ", ")) { ... }
foreach (e; splitter(input, regex(", *"))) { ... }

The speed/flexibility tradeoff is self-evident and under the control of 
the programmer without much fuss as it's very easy to switch from one 
form to another.

> And if any of you complains about the extra verbosity, here's what I 
> suggest:
> 
>     auto s = replace("abracazoo", re"a([b-e])"g, se"A$1");
> 
> Yes, syntaxic sugar for declaring regular expressions.
> 
> 
>> Two other syntactic options are available:
>>
>> "abracazoo".match(regex("a[b-e]", "g")))
>> "abracazoo".match("a[b-e]", "g")
> 
> I despise the second one, because if you omit regex(...) it makes me 
> think you're checking for string matches, not expression matches. 
> There's nothing in the name of the funciton telling you you're dealing 
> with a regular expression, so it could easily get confusing.

This is yet another proof that discussion of syntax, notation, and 
naming will never go out of fashion. I was half convinced by the others 
that we're in good shape with input.match(regex).


Andrei



More information about the Digitalmars-d mailing list