std.d.lexer: pre-voting review / discussion
H. S. Teoh
hsteoh at quickfur.ath.cx
Thu Sep 12 10:18:41 PDT 2013
On Thu, Sep 12, 2013 at 06:09:41PM +0200, deadalnix wrote:
> On Thursday, 12 September 2013 at 14:09:43 UTC, H. S. Teoh wrote:
> >This can be handled by using an intermediate grammar rule. Reduce all
> >(...) into an intermediate type, say ArgList, so the reduction
> >happens something like this:
> >
> > int foo () () {}
> > Type Ident ArgList ArgList ^
> >
> >Then have the rule:
> >
> > CompileTimeArgs ::= ArgList
> > RuntimeArgs ::= ArgList
> > TemplateDecl ::= Type Ident CompileTimeArgs RuntimeArgs '{' ...
> > FuncDecl ::= Type Ident RuntimeArgs '{' ...
> >
> >So first, all (...) gets parsed to ArgList, but it's not yet fixed
> >whether they are compile-time arguments or runtime arguments. It's
> >only after you see the next '(' or '{' that you decide whether
> >ArgList should reduce to CompileTimeArgs or RuntimeArgs.
> >
> >ArgList itself, of course, will accept all possible parameters (both
> >runtime and compile-time): types, expressions, symbols. Then when you
> >reduce it to RuntimeArgs, you reject stuff that can't be interpreted
> >as parameter declarations.
> >
>
> And then you got to backtrack the parsing instead of the lexing. You
> just moved the problem around. You'll have to create some temporary
> ast nodes that then will fix into what they really are.
No. You can just use ArgListItem for both runtime args and compile-time
args. Once you decided which one it is, wrong arguments are rejected at
semantic time (which you have to do anyway).
Let's take a concrete example. Say we're parsing this invalid code:
int foo(alias A)(alias B) {}
You'd go through these steps:
1) Parse initial prefix of declaration:
int foo(alias A)(alias B) {}
^
AST:
FuncDecl
|--RetType: int
|--Ident: foo
\--[ being built ]
2) Parse first (...):
int foo(alias A)(alias B) {}
^
AST:
FuncDecl
|--RetType: int
|--Ident: foo
|--ArgList
| \-- AliasArg
| \-- ident: A
\--[ being built ]
I'm skipping the intermediate steps here, it's obvious how to
construct AliasArg from the usual parsing process.
3) Parse second (...):
int foo(alias A)(alias B) {}
^
AST:
FuncDecl
|--RetType: int
|--Ident: foo
|--ArgList
| \-- AliasArg
| \-- ident: A
|--ArgList
| \-- AliasArg
| \-- ident: B
\--[ being built ]
4) At this point, you now know the first ArgList is CompileTimeArgList,
and the second is RuntimeArgList, so you can just change the type
fields (along with narrowing FuncDecl to TemplateFuncDecl):
AST:
TemplateFuncDecl (was: FuncDecl)
|--RetType: int
|--Ident: foo
|--CompileTimeArgList (was: ArgList)
| \-- AliasArg
| \-- ident: A
|--RuntimeArgList (was: ArgList)
| \-- AliasArg
| \-- ident: B
\--[ being built ]
Since you're still constructing FuncDecl, your current parsing context
should still have a direct reference to the partially-constructed
FuncDecl node, which in turn has a direct reference to both ArgList
child nodes. So this is just dereferencing a couple of pointers. No
backtracking.
5) Finish parsing the declaration:
int foo(alias A)(alias B) {}
^
AST:
TemplateFuncDecl
|--RetType: int
|--Ident: foo
|--CompileTimeArgList (was: ArgList)
| \-- AliasArg
| \-- ident: A
|--RuntimeArgList (was: ArgList)
| \-- AliasArg
| \-- ident: B
\--FuncBody
\-- CompoundStatement
\-- [empty body]
6) Run semantic:
- Create local symbol table for foo, etc..
- Run semantic on CompileTimeArgList:
- Check AliasArg for validity
- Run semantic on AliasArg: add A to function's local symbol
table, etc.
- Run semantic on RuntimeArgList:
- Check AliasArg for validity: ERROR: cannot have alias parameter
at runtime.
- (Add B to local symbol table)(skipped due to previous error)
- (Run semantic on FuncBody)(skipped due to previous error)
- (Run semantic on RetType (verify return type match, etc.))(skipped
due to previous error)
- (Add function to parent scope symbol table)(skipped due to previous
error)
So, no backtracking is necessary.
Of course, it sounds like DMD's parser doesn't work this way, but that's
a limitation of DMD's parser, not an *inherent* need for backtracking.
T
--
I see that you JS got Bach.
More information about the Digitalmars-d
mailing list