Is str ~ regex the root of all evil, or the leaf of all good?

bearophile bearophileHUGS at lycos.com
Thu Feb 19 07:38:42 PST 2009


Andrei Alexandrescu:

>I think the "g", "i", and "m" flags are popular enough if you've done any amount of regex
programming.<

I think I don't like the "g".

-----------------------

To test an API it's often good to try to use it or compare it against similar practical&common operations done with another language or library. So here I show two examples in Python. You can try to translate such two operations with the std.re of D2 to see how they become :-)


The first example shows the usage of a callable for re.sub() (in D it may be called replace()).

Here replacer() is a user-defined function given to re.sub()/matchobj.sub() that they call on each match.

Note that in Python functions are objects, so I have dynamically added to the replacer() function an instance attribute named "counter". In D (and Python) you can do the same thing creating a small class with counter attribute.


import re

def replacer(mobj):
    replacer.counter += 1
    return "REPL%02d" % replacer.counter
replacer.counter = 0

s1 = ".......TAG............TAG................TAG..........TAG....."

result = ".......REPL01............REPL02................REPL03..........REPL04..."

r = re.sub("TAG", replacer, s1)
assert r == result

----------

This is a little example of managing groups in Python:

>>> import re
>>> data = ">hello1 how are5 you?<"
>>> patt = re.compile(r".*?(hello\d).*?(are\d).*")
>>> patt.match(data).groups()
('hello1', 'are5')


(notes that here all groups are found eagerly. If you want a lazy matching in Python you have to use re.finditer() or matchobj.finditer()).

I may like a syntax similar to this, where opIndex() allows to find the matched group:

>>> patt.match(data)[0]
'hello1'
>>> patt.match(data)[1]
'are5'

Bye,
bearophile



More information about the Digitalmars-d mailing list