Is str ~ regex the root of all evil, or the leaf of all good?

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Feb 19 07:46:47 PST 2009


Derek Parnell wrote:
> On Thu, 19 Feb 2009 07:01:56 -0800, Andrei Alexandrescu wrote:
> 
>> These all put the regex before the string, something many people would 
>> find unsavory.
> 
> I don't. To me the regex is what you are looking for so it's like saying
> "find this pattern in that string". 

Yah, but to most others it's "match this string against that pattern". 
Again, regexes have a long history behind them. So probably we need to 
have both "find" and "match" with different order of arguments, something .

Anyway, std.algorithm defines find() like this:

find(haystack, needle)

In the least structured case, the haystack is a range and needle is 
either an element or another range. But then we can think, hey, we can 
think of efficient finds by using a more structured haystack and/or a 
more structured needle. So then:

string a = "conoco", b = "co";
// linear find
auto r1 = find(a, b[0]);
// quadratic find
auto r2 = find(a, b);
// organize a in a Boyer-Moore structure; sublinear find
auto r3 = find(boyerMoore(a), b);

I'll actually implement the above, it's pretty nice. Now the question 
is, what's the haystack and what's the needle in a regex find?

auto r3 = find("conoco", regex("c[a-z]"));

or

auto r3 = find(regex("c[a-z]"), "conoco");

?

The argument could go both ways:

"Organize the set of 2-char strings starting with 'c' and ending with 
'a' to 'z' into a structured haystack, then look for substrings of 
"conoco" in that haystack."

versus

"Given the unstructured haystack conoco, look for a structured needle in 
it that is any 2-char string starting with 'c' and ending with 'a' to 'z'."

What is the most natural way?


Andrei



More information about the Digitalmars-d mailing list