Is str ~ regex the root of all evil, or the leaf of all good?
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Thu Feb 19 06:53:35 PST 2009
Michel Fortin wrote:
> On 2009-02-19 00:50:06 -0500, Bill Baxter <wbaxter at gmail.com> said:
>
>> On Thu, Feb 19, 2009 at 2:35 PM, Andrei Alexandrescu
>>> In general I'm weary of unwitting operator overloading, but I think this
>>> case is more justified than others. Thoughts?
>>
>> No. ~ means matching in Perl. In D it means concatenation. This
>> special case is not special enough to warrant breaking D's convention,
>> in my opinion. It also breaks D's convention that operators have an
>> inherent meaning which shouldn't be subverted to do unrelated things.
>
> Indeed. That's why I don't like seeing `~` here.
>
>
>> What about turning it around and using 'in' though?
>>
>> foreach(e; regex("a[b-e]", "g") in "abracazoo")
>> writeln(e);
>>
>> The charter for "in" isn't quite as focused as that for ~, and anyway
>> you could view this as finding instances of the regular expression
>> "in" the string.
>
> That seems reasonable, although if we support it it shouldn't be limited
> to regular expressions for coherency reasons. For instance:
>
> foreach(e; "co" in "conoco")
> writeln(e);
>
> should work too. If we can't make that work in the most simple case,
> then I'd say it shouldn't with the more complicated ones either.
Well I'm a bit unhappy about that one. At least in current D and to
yours truly, "in" means "fast membership lookup". The use above is
linear lookup. I'm not saying that's bad, but I prefer the non-diluted
semantics. For linear search, there's always find().
> By the way, regular expressions should work everywhere where we can
> search for a string. For instance (from std.string):
>
> auto firstMatchIndex = find("conoco", "co");
>
> should work with a regex too:
>
> auto firstMatchIndex = find("abracazoo", regex("a[b-e]", "g"));
If you mean typeof(firstMatchIndex) to be size_t, that's unlikely to be
enough. When looking for a regular expression, you need more than just
an index - you need captures, "pre" and "post" substrings, the works.
That's why matching a string against a regex must return a richer
structure that can't be easily integrated with std.algorithm.
Andrei
More information about the Digitalmars-d
mailing list