earthquake changes of std.regexp to come

Daniel de Kok me at danieldk.org
Tue Feb 17 13:23:35 PST 2009


On Tue, Feb 17, 2009 at 9:50 PM, Jarrett Billingsley
<jarrett.billingsley at gmail.com> wrote:
> On Tue, Feb 17, 2009 at 3:30 PM, Daniel de Kok <me at danieldk.org> wrote:
>>
>> Hmmm, define "complex"
>
> \w+([\-+.]\w+)*@\w+([\-.]\w+)*\.\w+([\-.]\w+)*
>
> This is a simple email regexp.  This takes about 4 or 5 seconds to
> compile on my lappy (Pentium M).

Hmm, odd. I have translated that regexp to the syntax of the tool that
we used, that is written in Prolog (it is generally a constant factor
slower than C/C++/D equivalents). Generating a minimized DFA takes far
less than a second. I used the following expression (abstracted a bit
with macros):

---
macro(letter, {a..z, 'A'..'Z'}).
macro(punctlet,[{-,+,.},letter+]).
macro(dompunctlet,[{-,.},letter+]).
macro(email,[letter+,punctlet*,@,letter+,dompunctlet*,.,letter+,dompunctlet*]).
---

The software is available from:
http://www.let.rug.nl/~vannoord/Fsa/fsa.html

Take care,
Daniel



More information about the Digitalmars-d mailing list