compile-time regex redux

Andrei Alexandrescu (See Website For Email) SeeWebsiteForEmail at erdani.org
Wed Feb 7 15:39:06 PST 2007


Bill Baxter wrote:
> Walter Bright wrote:
>> String mixins, in order to be useful, need an ability to manipulate 
>> strings at compile time. Currently, the core operations on strings 
>> that can be done are:
>>
>> 1) indexed access
>> 2) slicing
>> 3) comparison
>> 4) getting the length
>> 5) concatenation
>>
>> Any other functionality can be built up from these using template 
>> metaprogramming.
>>
>> The problem is that parsing strings using templates generates a large 
>> number of template instantiations, is (relatively) very slow, and 
>> consumes a lot of memory (at compile time, not runtime). For example, 
>> ParseInteger would need 4 template instantiations to parse 5678, and 
>> each template instantiation would also include the rest of the input 
>> as part of the template instantiation's mangled name.
>>
>> At some point, this will prove a barrier to large scale use of this 
>> feature.
>>
>> Andrei suggested using compile time regular expressions to shoulder 
>> much of the burden, reducing parsing of any particular token to one 
>> instantiation.
> 
> That would help I suppose, but at the same time regexps themselves have 
> a tendancy to end up being 'write-only' code.  The heavy use of them in 
> perl is I think a large part of what gives it a rep as a write-only 
> language.   Heh heh.  I just found this regexp for matching RFC 822 
> email addresses:
>     http://www.regular-expressions.info/email.html
> (the one at the bottom of the page)

I think this must be qualified and understood in context. First, much of 
Perl's reputation of write-only code has much to do with the implicit 
variables and the generous syntax. The Perl regexps are a standard that 
all other regexp packages emulate and compare against.

Showcasing the raw RFC 822 email parsing regexp is not very telling. 
Notice there's a lot of repetition. With symbols, the grammar is very 
easy to implement with readable regular expressions - and this is how 
anyone in their right mind would do it.


Andrei



More information about the Digitalmars-d mailing list