Is str ~ regex the root of all evil, or the leaf of all good?
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Feb 19 06:45:58 PST 2009
Michel Fortin wrote:
> On 2009-02-19 00:35:20 -0500, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> said:
>
>> auto s = sub("abracazoo", regex("a([b-e])", "g"), "A$1");
>
> I don't like `sub`, I mean the name. Makes me think of substring more
> than substitute. My choice would be to reuse what we have in std.string
> and augment it to work with regular expressions:
>
> auto s = replace("abracazoo", regex("a([b-e])", "g"), subex("A$1"));
Ok. Probably subex is a bit of a killer, but I see your point (subex is
not an arbitrary string).
> This way it works consistently whether you're using a string or a
> regular expression: just replace any pattern string with regex(...) and
> any replacement string with subex(...) -- "substition-expression" --
> when you want them to be parsed as such. Omitting subex in the above
> would make it a plain string replacement for instance (this way it's
> easy to place use a variable there).
Indeed, that was part of the impetus for making regex a distinct type
that participates in larger functions. The only problem is that regex
does not work with std.algorithm in an obvious way, e.g. find() works
very differently for strings and regexes. I considered at a point trying
to integrate them, but decided to not spend that effort right now.
> These functions should allow easy substitution of any string or regex
> pattern with another algorithm for matching the pattern.
>
> And there's not way to get a range of matches using std.string, but
> there should be, and it should follow the same rule as above: supporting
> strings and regex consistently. (Using the `in` operator as suggested by
> Bill Baxter seems a good fit for this function.)
I defined the following in std.algorithm (signatures simplified):
// Split a range by a 1-element separator
Splitter!(...) splitter(Range, Element)(Range input, Range separator);
// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Range separator);
I then defined this in std.regex:
// Split a range by a subrange separator
Splitter!(...) splitter(Range)(Range input, Regex separator);
Now this is very nice because you get to switch from one to another very
easily.
foreach (e; splitter(input, ',')) { ... }
foreach (e; splitter(input, ", ")) { ... }
foreach (e; splitter(input, regex(", *"))) { ... }
The speed/flexibility tradeoff is self-evident and under the control of
the programmer without much fuss as it's very easy to switch from one
form to another.
> And if any of you complains about the extra verbosity, here's what I
> suggest:
>
> auto s = replace("abracazoo", re"a([b-e])"g, se"A$1");
>
> Yes, syntaxic sugar for declaring regular expressions.
>
>
>> Two other syntactic options are available:
>>
>> "abracazoo".match(regex("a[b-e]", "g")))
>> "abracazoo".match("a[b-e]", "g")
>
> I despise the second one, because if you omit regex(...) it makes me
> think you're checking for string matches, not expression matches.
> There's nothing in the name of the funciton telling you you're dealing
> with a regular expression, so it could easily get confusing.
This is yet another proof that discussion of syntax, notation, and
naming will never go out of fashion. I was half convinced by the others
that we're in good shape with input.match(regex).
Andrei
More information about the Digitalmars-d
mailing list