Builtin regex (Was: How to complex switch?)

Dmitry Olshansky dmitry.olsh at gmail.com
Fri May 13 09:52:21 PDT 2011


On 13.05.2011 20:00, Robert Clipsham wrote:
> On 13/05/2011 16:12, Dmitry Olshansky wrote:
>>> Regex is ugly, impossible to maintain/debug and slow for anything
>>> mildly complicated - a handwritten parser is magnitudes faster, and
>>> easy to understand, maintain and debug. If it's simple, you may as
>>> well write a couple of extra lines and have it be a lot faster.
>>>
>>
>> Handwritten parser is faster, but hard to get right, port or maintain.
>> Also handwritten parser has almost zero flexibility - patching it to
>> accommodate new kinds of input is PITA.
>
> When you've written a couple it doesn't take much to get it right in 
> my experience. I don't find them hard to maintain personally, I guess 
> that comes from experience though. What do you mean port?
>

Literally code the same in another programming language. Suppose there 
are going to be other programs working with the same data in similar ways.

> As for patching for new input, that's a doddle if it's well written. 
> Changing a regex on the other hand... It's generally easier to write 
> it from scratch than decipher a current one.
>
I see, the deciphering could really get tricky. As for rewriting from 
scratch that's something you generally try hard to avoid with 
handwritten stuff.

>> Regexes on the other hand are widely known DSL for pattern matching,
>> which IMO easier to get right, port across languages and platforms (with
>> caveats). Flexibility - regex engines in a way are just simple parser
>> generators with some bells and whistles. And BTW they could be quite
>> fast (depending on the use case, of course).
>
> Of course, I don't feel that grants them a place in the language 
> though. Particularly with the likes of octal! - it can quite easily be 
> in a library and work just as well.

Sure thing it shouldn't be built it. D isn't awk.
>
>>> Just my opinion of course, I know you're bound to disagree :>
>>>
>>
>> Probably you never faced problems like "get me some info from these
>> <enter your gazillon number> simple reports/emails/etc." All things that
>> don't follow some formal language enjoy flexibility on the parser side.
>> E.g. you can patch together through try/trial cycle a couple of
>> aproximate regexes and have very reasonable results in no time.
>
> I have, and I use regex for it. Those kinda things just need a quick 
> hack, and that's how I treat regex. If I'm doing anything that's 
> getting used in production code/anything that isn't intended to be a 
> hack I write a proper parser.
>

Ok, so I think we can agree on the simple fact that there are things 
that just do not worth handwritten parser.
>> The real pitfall of regexes vs handcrafted parsers is that they can't
>> match important classes of problems like: "the innermost parenthesized
>> expression in a given string" and such.
>
> KennyTM~ gave you regex for that, providing you mean outermost - 
> \((.+)\) will do it. 

Like I said, it's funny but it really can't do a lot with braces in 
general. It even fails to check balanced braces(of arbitrary depth).
Yours \((.+)\)   fails with
(a+b*c) + ((a*b)+c)
the _parenthesized expression_ here is "((A*B+C)" not "(a+b*c) + 
((a*b)+c)" (It's an expression? - yes. It's parenthesized? - no)

> Somewhere inbetween is harder, but doable if I recall correctly. But 
> yes, there is a whole class of problems regex can't solve. The real 
> pitfall for me is how difficult it is to decypher/debug.
>

-- 
Dmitry Olshansky



More information about the Digitalmars-d mailing list