Nested RegEx

Dmitry Olshansky dmitry.olsh at gmail.com
Sat Apr 21 13:42:45 PDT 2012


On 21.04.2012 22:46, H. S. Teoh wrote:
> On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
>> On 21.04.2012 21:24, nrgyzer wrote:
>>> Hi guys,
>>>
>>> I'm trying to use std.regex to parse a string like the following:
>>>
>>> string myString = "preOuter {if condition1} content1 {if condition2} content2
>>> {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
>>>
>>
>> Simply put pure regex is incapable of arbitrary-nested if-else
>> statements. In a sense if... /if is a case of balanced parens
>> problem and it's widely know to form non-regular language.
>
> This is of theoretical interest to me. Very often I find myself wanting
> a concise way to express patterns with nested matching delimiters, but
> regexes can't do it. But to jump to a full-blown stack-based language
> seems like an overkill to me: stack languages can express *much* more
> than just nested delimiters, most of which is difficult to encapsulate
> in a nice, concise syntax like regexes. All I want is a simple way to
> express the kind of simple nesting that matching delimiters give.
>

Recursive descent is not particularly bad unless minimal "grammar 
descent depth" is high. Example:
a+b-c
uses a quite a lot of recursive calls for grammar with e.g. 10+ operator 
precedence levels.


I'm thinking of merging operator precedence parser with regex might be a 
happy marriage you know.

Back to OP topic something along the lines of this will do it (beware of 
stack overflow):

void parseIf(){
	static int ifNest;
	if(input.startWith("{if")){
		ifNest++;
		scope(exit) ifNest--;
		enforce(ifNest < 10000, "conservative stack overflow");
		parseCond(input[2..$-1]);//regex that does condition
		enforce(input[$-1] == '}', "close that if");
		parseIf();//parse body or another nested if
		//parseElse();//goes here, as does elif
	}
	else
		parseBody();// the regex you used before
}


>
> [...]
>> One day I may add push-pop stack extensions (that allow this kind of
>> thing) into regex but I'd have to think hard to make it efficient.
> [...]
>
> Plus, have a nice concise syntax for it (otherwise I might as well just
> write a recursive descent parser for it in the first place).
>

Concise syntax & lots of power is a priority actually, because I can 
detect if push-pop stuff is used or not and pick the right engine for 
the job. So different big-oh complexity is not important.

-- 
Dmitry Olshansky


More information about the Digitalmars-d-learn mailing list