How to write parser?

Suliman via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sun May 14 12:00:09 PDT 2017


I am trying to learn how to write text parser. I have example doc 
with follow format:

#Header
my header text

##SubHeader
my sub header text

###Sub3Header
my sub 3 text

#Header21
my header2 text

##SubHeader21
my header2 text

###SubHeader22
my header3 text


I would like to wrap all level(#) tags to HTML div's, to get it's 
look like:

<div 1>
#Header
my header text
<div 2>
##SubHeader
my sub header text

###Sub3Header
my sub 3 text
</div 3>
</div 2>
</div 1>

<div 1>#Header21
my header2 text

<div 2>##SubHeader21
my header2 text

<div 3>###SubHeader22
my header3 text
</div 3>
</div 2>
</div 1>

It's seems that I wrong understand parser logic. I am trying to 
do it's in next way:

	bool isH1Open;
	bool isH2Open;
	bool isH3Open;

	string newcontent;

	foreach(line; content.lineSplitter)
	{
		if(line.length > 3) // to prevent access to line < 3 symblos
		{
			if(!isH1Open && line[0] == '#' && line[1] != '#')
			{
				isH1Open = true;
				line = `<div 1>` ~ "\n" ~ line ;
				newcontent ~= line;
				continue;
			}


			if(isH2Open && line[1] == '#' && line[2] != '#')
			{
				isH2Open = false;
				line = "\n" ~ `</div 2>` ~ "\n";
				newcontent ~= line;
				continue;
			}

			if(isH1Open && line[0] == '#' && line[1] != '#')
			{
				isH1Open = false;
				line = "\n" ~ `</div 1>` ~ "\n";
				newcontent ~= line;
				continue;
			}

	
			if(!isH2Open && line[1] == '#' && line[2] != '#')
			{
				isH2Open = true;
				line = "\n" ~ `<div 2>` ~ "\n" ~ line ;
				newcontent ~= line;
				continue;
			}


		}

But I am getting wrong output:

<div 1>
#Header
<div 2>
##SubHeader
</div 1>

</div 2>
<div 1>
#Header31
<div 2>
##SubHeader31

it's there any better way to parse such format?



More information about the Digitalmars-d-learn mailing list