Is str ~ regex the root of all evil, or the leaf of all good?

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Feb 19 06:53:35 PST 2009


Michel Fortin wrote:
> On 2009-02-19 00:50:06 -0500, Bill Baxter <wbaxter at gmail.com> said:
> 
>> On Thu, Feb 19, 2009 at 2:35 PM, Andrei Alexandrescu
>>> In general I'm weary of unwitting operator overloading, but I think this
>>> case is more justified than others. Thoughts?
>>
>> No.  ~ means matching in Perl.  In D it means concatenation.  This
>> special case is not special enough to warrant breaking D's convention,
>> in my opinion.  It also breaks D's convention that operators have an
>> inherent meaning which shouldn't be subverted to do unrelated things.
> 
> Indeed. That's why I don't like seeing `~` here.
> 
> 
>> What about turning it around and using 'in' though?
>>
>>    foreach(e; regex("a[b-e]", "g") in "abracazoo")
>>       writeln(e);
>>
>> The charter for "in" isn't quite as focused as that for ~, and anyway
>> you could view this as finding instances of the regular expression
>> "in" the string.
> 
> That seems reasonable, although if we support it it shouldn't be limited 
> to regular expressions for coherency reasons. For instance:
> 
>     foreach(e; "co" in "conoco")
>         writeln(e);
> 
> should work too. If we can't make that work in the most simple case, 
> then I'd say it shouldn't with the more complicated ones either.

Well I'm a bit unhappy about that one. At least in current D and to 
yours truly, "in" means "fast membership lookup". The use above is 
linear lookup. I'm not saying that's bad, but I prefer the non-diluted 
semantics. For linear search, there's always find().

> By the way, regular expressions should work everywhere where we can 
> search for a string. For instance (from std.string):
> 
>     auto firstMatchIndex = find("conoco", "co");
> 
> should work with a regex too:
> 
>     auto firstMatchIndex = find("abracazoo", regex("a[b-e]", "g"));

If you mean typeof(firstMatchIndex) to be size_t, that's unlikely to be 
enough. When looking for a regular expression, you need more than just 
an index - you need captures, "pre" and "post" substrings, the works. 
That's why matching a string against a regex must return a richer 
structure that can't be easily integrated with std.algorithm.


Andrei



More information about the Digitalmars-d mailing list