Is str ~ regex the root of all evil, or the leaf of all good?

Andrei Alexandrescu SeeWebsiteForEmail at erdani.org
Thu Feb 19 08:00:41 PST 2009


bearophile wrote:
> Andrei Alexandrescu:
> 
>> I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex
> programming.<
> 
> I think I don't like the "g".
> 
> -----------------------
> 
> To test an API it's often good to try to use it or compare it against similar practical&common operations done with another language or library. So here I show two examples in Python. You can try to translate such two operations with the std.re of D2 to see how they become :-)
> 
> 
> The first example shows the usage of a callable for re.sub() (in D it may be called replace()).
> 
> Here replacer() is a user-defined function given to re.sub()/matchobj.sub() that they call on each match.
> 
> Note that in Python functions are objects, so I have dynamically added to the replacer() function an instance attribute named "counter". In D (and Python) you can do the same thing creating a small class with counter attribute.
> 
> 
> import re
> 
> def replacer(mobj):
>     replacer.counter += 1
>     return "REPL%02d" % replacer.counter
> replacer.counter = 0
> 
> s1 = ".......TAG............TAG................TAG..........TAG....."
> 
> result = ".......REPL01............REPL02................REPL03..........REPL04..."
> 
> r = re.sub("TAG", replacer, s1)
> assert r == result
> 
> ----------

Excellent idea. Let's see:

uint counter;
string replacer(string) { return format("REPL%02d", counter++); }
auto s1 = ".......TAG............TAG................TAG..........TAG.....";
auto result = 
".......REPL01............REPL02................REPL03..........REPL04...";
r = replace!(replacer)(s1, "TAG");
assert(r == result);

> This is a little example of managing groups in Python:
> 
>>>> import re
>>>> data = ">hello1 how are5 you?<"
>>>> patt = re.compile(r".*?(hello\d).*?(are\d).*")
>>>> patt.match(data).groups()
> ('hello1', 'are5')

auto data = ">hello1 how are5 you?<";
auto iter = match(data, regex(r".*?(hello\d).*?(are\d).*"));
foreach (i; 0 .. iter.engine.captures)
     writeln(iter.capture[i]);

> (notes that here all groups are found eagerly. If you want a lazy matching in Python you have to use re.finditer() or matchobj.finditer()).
> 
> I may like a syntax similar to this, where opIndex() allows to find the matched group:
> 
>>>> patt.match(data)[0]
> 'hello1'
>>>> patt.match(data)[1]
> 'are5'

No go due to confusions with random-access ranges.


Andrei



More information about the Digitalmars-d mailing list