compile-time regex redux

Andrei Alexandrescu (See Website For Email) SeeWebsiteForEmail at erdani.org
Wed Feb 7 16:20:02 PST 2007


Walter Bright wrote:
> Andrei Alexandrescu (See Website For Email) wrote:
>> Walter Bright wrote:
>>> But I think we now have good reasons to revisit this, at least for 
>>> compile time use only. For example:
>>>
>>>     ("aa|b" ~~ "ababb") would evaluate to "ab"
>>>
>>> I expect one would generally only see this kind of thing inside 
>>> templates, not user code.
>>
>> The more traditional way is to mention the string first and pattern 
>> second, so:
>>
>> ("ababb" ~~ "aa|b") // match this guy against this pattern
>>
>> And I think it returns "b" - juxtaposition has a higher priority than 
>> "|", so your pattern is "either two a's or one b". :o)
> 
> My bad. Some more things to think about:
> 
> 1) Returning the left match, the match, the right match?

Perl does allow that (has IIRC $` and $' to mark the left and right 
surrounding substrings), but the recommended style is to use capturing 
parens if you need the left and right portion; this makes all matching 
code more efficient.

So if you want to match the left- and right-substrings you say:

("ababb" ~~ "(.*)(aa|b)(.*)")

and you get in return three juicy strings: left, match, and right.

> 2) Returning values of parenthesized expressions?

Probably it's easiest to always return const char[][]. If you don't have 
capturing parens, you could return const char[].

> 3) Some sort of sed-like replacement syntax?

Definitely; otherwise it's a pain to express it, particularly because 
you can't mutate things during compilation.

("ababb" ~~ s/"(.*)(aa|b)(.*)"/"$1 here was an aa|b $2"/i)

(This doesn't make 's' a keyword; it's just used as punctuation.) 
Probably a more D-like syntax could be devised, but that could be also 
seen as gratuitous incompatibility with sed, perl etc.

The last "/" is useful because flags could follow it, as is the case 
here (i = ignore case).

> An alternative is to have the compiler recognize std.Regexp names as 
> being built-in.

Blech. :o)


Andrei



More information about the Digitalmars-d mailing list