Builtin regex (Was: How to complex switch?)

Dmitry Olshansky dmitry.olsh at gmail.com
Fri May 13 08:47:48 PDT 2011


On 13.05.2011 19:35, Dmitry Olshansky wrote:
> On 13.05.2011 19:26, KennyTM~ wrote:
>> On May 13, 11 23:12, Dmitry Olshansky wrote:
>>> On 13.05.2011 18:25, Robert Clipsham wrote:
>>>> On 13/05/2011 05:14, Ary Manzana wrote:
>>>>> How about making regex a built-in feature with this syntax: /regex/ ?
>>>>>
>>>>> I didn't use regex a lot before I started using Ruby. The thing 
>>>>> is, in
>>>>> Ruby it's so easy to use regex that I just started using them a 
>>>>> lot more
>>>>> than before. Of course, ruby has built-in operators for matching 
>>>>> regexs,
>>>>> so maybe that should also be added to the language (it's the =~
>>>>> operator, but in D it should be a different one.)
>>>>
>>>> Regex is ugly, impossible to maintain/debug and slow for anything
>>>> mildly complicated - a handwritten parser is magnitudes faster, and
>>>> easy to understand, maintain and debug. If it's simple, you may as
>>>> well write a couple of extra lines and have it be a lot faster.
>>>>
>>>
>>> Handwritten parser is faster, but hard to get right, port or maintain.
>>> Also handwritten parser has almost zero flexibility - patching it to
>>> accommodate new kinds of input is PITA.
>>> Regexes on the other hand are widely known DSL for pattern matching,
>>> which IMO easier to get right, port across languages and platforms 
>>> (with
>>> caveats). Flexibility - regex engines in a way are just simple parser
>>> generators with some bells and whistles. And BTW they could be quite
>>> fast (depending on the use case, of course).
>>>
>>>> Just my opinion of course, I know you're bound to disagree :>
>>>>
>>>
>>> Probably you never faced problems like "get me some info from these
>>> <enter your gazillon number> simple reports/emails/etc." All things 
>>> that
>>> don't follow some formal language enjoy flexibility on the parser side.
>>> E.g. you can patch together through try/trial cycle a couple of
>>> aproximate regexes and have very reasonable results in no time.
>>>
>>> The real pitfall of regexes vs handcrafted parsers is that they can't
>>> match important classes of problems like: "the innermost parenthesized
>>> expression in a given string" and such.
>>>
>>
>> Nitpick: *Innermost* parenthesized expression can be parsed easily 
>> with regex (r"\([^)]*\)"). Outermost would be difficult.
> Yeah, I certainly meant outermost ;)
Actually, wait a sec...
r"\([^)]*\)" won't help :
(a+(b+c)+d)
matches:
(a+(b+c) which is not intended...
r"\([^()]*\)" should do the trick

-- 
Dmitry Olshansky



More information about the Digitalmars-d mailing list