Parser
Cecil Ward
cecil at cecilward.com
Wed Jun 14 09:28:57 UTC 2023
I’m thinking that I might had to end up writing a partial, rather
rough parser for parts of the D language. Could I get some
suggestions for help that I might find in the way of software
components? D has a very powerful regex module, I believe.
I have been writing inline asm library routines for GDC as a
learning exercise and unfortunately I can’t build them under LDC
because LDC does not yet offer full support for the GCC in-line
asm grammar, specifically named in-asm arguments such as " mov
%[dest], %[src]" - where you see the names enclosed in [ ]. I’m
thinking that I might have to fix this deficiency myself. There’s
no way that I can enhance LDC myself as I wouldn’t even know
where to start.
I could pre-process the string expressions used in inline asm so
that LDC could understand an alternative easier grammar, one
where there are numbers instead of "[names]", eg "%0" instead of
the meaningful "%[dest]". It seems that the compilers take string
_expressions_ everywhere rather than just simple literal strings.
Can I generate fragments of D and inject them into the rest of
the code using mixin? Not really sure how to use it.
There are three string expressions involved: the string
containing the asm, which needs to be scanned for %[ names ], and
these need to be replaced with numbers in order of occurrence of
declarations of the names, then an outputs section and an inputs
section which can both contain declarations of these names, eg ‘:
[ dest ] "=r" ( d-expression ) ,’ … ‘: [ src ]’…. The arbitrary
fragment of D in d-expression can unfortunately be anything, and
there’s no way I can write a full D lexer/parser to scan that
properly, but luckily I just have to pass over it to find its
terminator which is either a ‘,’ or a ‘:’. (There might be a case
where there is a ‘;’ as a terminator instead of a ‘:’, I’m not
sure if that’s permitted in the grammar immediately after the
inputs section.
But having to parse all the types of strings and operators in a
string-expression is hard enough. I will also have to deal with
all the possible comment types wherever they can occur, which is
all over the place within, before and after these expressions.
Any tips, modules that I could use would be most welcome. I’m
very much out of my depth here.
More information about the Digitalmars-d-learn
mailing list