Builtin regex (Was: How to complex switch?)

Robert Clipsham robert at octarineparrot.com
Fri May 13 09:00:34 PDT 2011


On 13/05/2011 16:12, Dmitry Olshansky wrote:
>> Regex is ugly, impossible to maintain/debug and slow for anything
>> mildly complicated - a handwritten parser is magnitudes faster, and
>> easy to understand, maintain and debug. If it's simple, you may as
>> well write a couple of extra lines and have it be a lot faster.
>>
>
> Handwritten parser is faster, but hard to get right, port or maintain.
> Also handwritten parser has almost zero flexibility - patching it to
> accommodate new kinds of input is PITA.

When you've written a couple it doesn't take much to get it right in my 
experience. I don't find them hard to maintain personally, I guess that 
comes from experience though. What do you mean port?

As for patching for new input, that's a doddle if it's well written. 
Changing a regex on the other hand... It's generally easier to write it 
from scratch than decipher a current one.

> Regexes on the other hand are widely known DSL for pattern matching,
> which IMO easier to get right, port across languages and platforms (with
> caveats). Flexibility - regex engines in a way are just simple parser
> generators with some bells and whistles. And BTW they could be quite
> fast (depending on the use case, of course).

Of course, I don't feel that grants them a place in the language though. 
Particularly with the likes of octal! - it can quite easily be in a 
library and work just as well.

>> Just my opinion of course, I know you're bound to disagree :>
>>
>
> Probably you never faced problems like "get me some info from these
> <enter your gazillon number> simple reports/emails/etc." All things that
> don't follow some formal language enjoy flexibility on the parser side.
> E.g. you can patch together through try/trial cycle a couple of
> aproximate regexes and have very reasonable results in no time.

I have, and I use regex for it. Those kinda things just need a quick 
hack, and that's how I treat regex. If I'm doing anything that's getting 
used in production code/anything that isn't intended to be a hack I 
write a proper parser.

> The real pitfall of regexes vs handcrafted parsers is that they can't
> match important classes of problems like: "the innermost parenthesized
> expression in a given string" and such.

KennyTM~ gave you regex for that, providing you mean outermost - 
\((.+)\) will do it. Somewhere inbetween is harder, but doable if I 
recall correctly. But yes, there is a whole class of problems regex 
can't solve. The real pitfall for me is how difficult it is to 
decypher/debug.

-- 
Robert
http://octarineparrot.com/


More information about the Digitalmars-d mailing list