Parser

Cecil Ward cecil at cecilward.com
Wed Jun 14 09:28:57 UTC 2023


I’m thinking that I might had to end up writing a partial, rather 
rough parser for parts of the D language. Could I get some 
suggestions for help that I might find in the way of software 
components? D has a very powerful regex module, I believe.

I have been writing inline asm library routines for GDC as a 
learning exercise and unfortunately I can’t build them under LDC 
because LDC does not yet offer full support for the GCC in-line 
asm grammar, specifically named in-asm arguments such as " mov 
%[dest], %[src]" - where you see the names enclosed in [ ]. I’m 
thinking that I might have to fix this deficiency myself. There’s 
no way that I can enhance LDC myself as I wouldn’t even know 
where to start.

I could pre-process the string expressions used in inline asm so 
that LDC could understand an alternative easier grammar, one 
where there are numbers instead of "[names]", eg "%0" instead of 
the meaningful "%[dest]". It seems that the compilers take string 
_expressions_ everywhere rather than just simple literal strings.

Can I generate fragments of D and inject them into the rest of 
the code using mixin? Not really sure how to use it.

There are three string expressions involved: the string 
containing the asm, which needs to be scanned for %[ names ], and 
these need to be replaced with numbers in order of occurrence of 
declarations of the names, then an outputs section and an inputs 
section which can both contain declarations of these names, eg ‘: 
[ dest ] "=r" ( d-expression ) ,’ … ‘: [ src ]’…. The arbitrary 
fragment of D in d-expression can unfortunately be anything, and 
there’s no way I can write a full D lexer/parser to scan that 
properly,  but luckily I just have to pass over it to find its 
terminator which is either a ‘,’ or a ‘:’. (There might be a case 
where there is a ‘;’ as a terminator instead of a ‘:’, I’m not 
sure if that’s permitted in the grammar immediately after the 
inputs section.

But having to parse all the types of strings and operators in a 
string-expression is hard enough. I will also have to deal with 
all the possible comment types wherever they can occur, which is 
all over the place within, before and after these expressions.

Any tips, modules that I could use would be most welcome. I’m 
very much out of my depth here.


More information about the Digitalmars-d-learn mailing list