Nested RegEx
Dmitry Olshansky
dmitry.olsh at gmail.com
Sat Apr 21 13:42:45 PDT 2012
On 21.04.2012 22:46, H. S. Teoh wrote:
> On Sat, Apr 21, 2012 at 09:41:18PM +0400, Dmitry Olshansky wrote:
>> On 21.04.2012 21:24, nrgyzer wrote:
>>> Hi guys,
>>>
>>> I'm trying to use std.regex to parse a string like the following:
>>>
>>> string myString = "preOuter {if condition1} content1 {if condition2} content2
>>> {elseif condition3} content3 {else}any other content{/if}{/if} postOuter";
>>>
>>
>> Simply put pure regex is incapable of arbitrary-nested if-else
>> statements. In a sense if... /if is a case of balanced parens
>> problem and it's widely know to form non-regular language.
>
> This is of theoretical interest to me. Very often I find myself wanting
> a concise way to express patterns with nested matching delimiters, but
> regexes can't do it. But to jump to a full-blown stack-based language
> seems like an overkill to me: stack languages can express *much* more
> than just nested delimiters, most of which is difficult to encapsulate
> in a nice, concise syntax like regexes. All I want is a simple way to
> express the kind of simple nesting that matching delimiters give.
>
Recursive descent is not particularly bad unless minimal "grammar
descent depth" is high. Example:
a+b-c
uses a quite a lot of recursive calls for grammar with e.g. 10+ operator
precedence levels.
I'm thinking of merging operator precedence parser with regex might be a
happy marriage you know.
Back to OP topic something along the lines of this will do it (beware of
stack overflow):
void parseIf(){
static int ifNest;
if(input.startWith("{if")){
ifNest++;
scope(exit) ifNest--;
enforce(ifNest < 10000, "conservative stack overflow");
parseCond(input[2..$-1]);//regex that does condition
enforce(input[$-1] == '}', "close that if");
parseIf();//parse body or another nested if
//parseElse();//goes here, as does elif
}
else
parseBody();// the regex you used before
}
>
> [...]
>> One day I may add push-pop stack extensions (that allow this kind of
>> thing) into regex but I'd have to think hard to make it efficient.
> [...]
>
> Plus, have a nice concise syntax for it (otherwise I might as well just
> write a recursive descent parser for it in the first place).
>
Concise syntax & lots of power is a priority actually, because I can
detect if push-pop stuff is used or not and pick the right engine for
the job. So different big-oh complexity is not important.
--
Dmitry Olshansky
More information about the Digitalmars-d-learn
mailing list