Nested RegEx

nrgyzer nrgyzer at gmail.com
Tue Apr 24 08:52:54 PDT 2012


== Auszug aus Dmitry Olshansky (dmitry.olsh at gmail.com)'s Artikel
> On 21.04.2012 22:46, H. S. Teoh wrote:
> > On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
> >> On 21.04.2012 21:24, nrgyzer wrote:
> >>> Hi guys,
> >>>
> >>> I'm trying to use std.regex to parse a string like the following:
> >>>
> >>> string myString = "preOuter {if condition1} content1 {if condition2}
content2
> >>> {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
> >>>
> >>
> >> Simply put pure regex is incapable of arbitrary-nested if-else
> >> statements. In a sense if... /if is a case of balanced parens
> >> problem and it's widely know to form non-regular language.
> >
> > This is of theoretical interest to me. Very often I find myself wanting
> > a concise way to express patterns with nested matching delimiters, but
> > regexes can't do it. But to jump to a full-blown stack-based language
> > seems like an overkill to me: stack languages can express *much* more
> > than just nested delimiters, most of which is difficult to encapsulate
> > in a nice, concise syntax like regexes. All I want is a simple way to
> > express the kind of simple nesting that matching delimiters give.
> >
> Recursive descent is not particularly bad unless minimal "grammar
> descent depth" is high. Example:
> a+b-c
> uses a quite a lot of recursive calls for grammar with e.g. 10+ operator
> precedence levels.
> I'm thinking of merging operator precedence parser with regex might be a
> happy marriage you know.
> Back to OP topic something along the lines of this will do it (beware of
> stack overflow):
> void parseIf(){
> 	static int ifNest;
> 	if(input.startWith("{if")){
> 		ifNest++;
> 		scope(exit) ifNest--;
> 		enforce(ifNest < 10000, "conservative stack overflow");
> 		parseCond(input[2..$-1]);//regex that does condition
> 		enforce(input[$-1] == '}', "close that if");
> 		parseIf();//parse body or another nested if
> 		//parseElse();//goes here, as does elif
> 	}
> 	else
> 		parseBody();// the regex you used before
> }
> >
> > [...]
> >> One day I may add push-pop stack extensions (that allow this kind of
> >> thing) into regex but I'd have to think hard to make it efficient.
> > [...]
> >
> > Plus, have a nice concise syntax for it (otherwise I might as well just
> > write a recursive descent parser for it in the first place).
> >
> Concise syntax & lots of power is a priority actually, because I can
> detect if push-pop stuff is used or not and pick the right engine for
> the job. So different big-oh complexity is not important.

Thanks guys... I just solved the problem by using indexOf() from std.string for
parsing my if-statements. So... in principle, I can have unlimited nested if-
statements, but I think it's much more difficult (and slower) than using any
regex-expression.


More information about the Digitalmars-d-learn mailing list