Is str ~ regex the root of all evil, or the leaf of all good?

bearophile bearophileHUGS at lycos.com
Thu Feb 19 03:47:57 PST 2009


Andrei Alexandrescu:

>but most regex code I've seen mentions the string first and the regex second. So I dropped that idea.<

I like the following syntaxes (the one with .match() too):

import std.re: regex;

foreach (e; regex("a[b-e]", "g") in "abracazoo")
     writeln(e);

foreach (e; regex("a[b-e]", "g").match("abracazoo"))
     writeln(e);

auto re1 = regex("a[b-e]", "g");
foreach (e; re1.match("abracazoo"))
     writeln(e);

auto re1 = regex("a[b-e]", "g");
foreach (e; re1 in "abracazoo")
     writeln(e);

----------------

I like the support of verbose regular expressions too, that ignore whitespace and comments (for example with //...) inserted into the regex itself. This simple thing is able to turn the messy world of regexes into programming again.

This is an example of usual RE in Python:

finder = re.compile("^\s*([\[\]])\s*([-+]?\d+)\s*,\s*([-+]?\d+)\s*([\[\]])\s*$")


This is the same RE in verbose mode, in Python still (# is the Python single-line comment syntax):

finder = re.compile(r"""
    ^ \s*             # start at beginning+ opt spaces
    ( [\[\]] )        # Group 1: opening bracket
        \s*           # optional spaces
        ( [-+]? \d+ ) # Group 2: first number
        \s* , \s*     # opt spaces+ comma+ opt spaces
        ( [-+]? \d+ ) # Group 3: second number
        \s*           # opt spaces
    ( [\[\]] )        # Group 4: closing bracket
    \s* $             # opt spaces+ end at the end
    """, flags=re.VERBOSE)

As you can see it's often very positive to indent logically those lines just like code.

----------------

As the other people here, I don't like the following much, it's a misleading overload of the ~ operator:

"abracazoo" ~ regex("a[b-e]", "g")

----------------

I don't like that "g" argument much, my suggestions:

RE attributes:
"repeat", "r": Repeat over the whole input string
"ignorecase", "i": case insensitive
"multiline", "m": treat as multiple lines separated by newlines
"verbose", "v": ignores space outside [] and allows comments

----------------

If not already so, I'd like sub() to take as replacement a string or a callable.

Bye,
bearophile



More information about the Digitalmars-d mailing list