std.d.lexer: pre-voting review / discussion

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Sep 12 10:18:41 PDT 2013


On Thu, Sep 12, 2013 at 06:09:41PM +0200, deadalnix wrote:
> On Thursday, 12 September 2013 at 14:09:43 UTC, H. S. Teoh wrote:
> >This can be handled by using an intermediate grammar rule. Reduce all
> >(...) into an intermediate type, say ArgList, so the reduction
> >happens something like this:
> >
> >	int  foo   ()      ()      {}
> >	Type Ident ArgList ArgList ^
> >
> >Then have the rule:
> >
> >	CompileTimeArgs ::= ArgList
> >	RuntimeArgs ::= ArgList
> >	TemplateDecl ::= Type Ident CompileTimeArgs RuntimeArgs '{' ...
> >	FuncDecl ::= Type Ident RuntimeArgs '{' ...
> >
> >So first, all (...) gets parsed to ArgList, but it's not yet fixed
> >whether they are compile-time arguments or runtime arguments. It's
> >only after you see the next '(' or '{' that you decide whether
> >ArgList should reduce to CompileTimeArgs or RuntimeArgs.
> >
> >ArgList itself, of course, will accept all possible parameters (both
> >runtime and compile-time): types, expressions, symbols. Then when you
> >reduce it to RuntimeArgs, you reject stuff that can't be interpreted
> >as parameter declarations.
> >
> 
> And then you got to backtrack the parsing instead of the lexing. You
> just moved the problem around. You'll have to create some temporary
> ast nodes that then will fix into what they really are.

No. You can just use ArgListItem for both runtime args and compile-time
args. Once you decided which one it is, wrong arguments are rejected at
semantic time (which you have to do anyway).

Let's take a concrete example. Say we're parsing this invalid code:

	int foo(alias A)(alias B) {}

You'd go through these steps:

1) Parse initial prefix of declaration:

	int foo(alias A)(alias B) {}
	       ^
	AST:
	FuncDecl
	 |--RetType: int
	 |--Ident: foo
	 \--[ being built ]

2) Parse first (...):

	int foo(alias A)(alias B) {}
	                ^
	AST:
	FuncDecl
	 |--RetType: int
	 |--Ident: foo
	 |--ArgList
	 |   \-- AliasArg
	 |        \-- ident: A
	 \--[ being built ]

   I'm skipping the intermediate steps here, it's obvious how to
   construct AliasArg from the usual parsing process.

3) Parse second (...):

	int foo(alias A)(alias B) {}
	                          ^
	AST:
	FuncDecl
	 |--RetType: int
	 |--Ident: foo
	 |--ArgList
	 |   \-- AliasArg
	 |        \-- ident: A
	 |--ArgList
	 |   \-- AliasArg
	 |        \-- ident: B
	 \--[ being built ]

4) At this point, you now know the first ArgList is CompileTimeArgList,
and the second is RuntimeArgList, so you can just change the type
fields (along with narrowing FuncDecl to TemplateFuncDecl):

	AST:
	TemplateFuncDecl (was: FuncDecl)
	 |--RetType: int
	 |--Ident: foo
	 |--CompileTimeArgList (was: ArgList)
	 |   \-- AliasArg
	 |        \-- ident: A
	 |--RuntimeArgList (was: ArgList)
	 |   \-- AliasArg
	 |        \-- ident: B
	 \--[ being built ]

Since you're still constructing FuncDecl, your current parsing context
should still have a direct reference to the partially-constructed
FuncDecl node, which in turn has a direct reference to both ArgList
child nodes. So this is just dereferencing a couple of pointers. No
backtracking.

5) Finish parsing the declaration:

	int foo(alias A)(alias B) {}
	                            ^
	AST:
	TemplateFuncDecl
	 |--RetType: int
	 |--Ident: foo
	 |--CompileTimeArgList (was: ArgList)
	 |   \-- AliasArg
	 |        \-- ident: A
	 |--RuntimeArgList (was: ArgList)
	 |   \-- AliasArg
	 |        \-- ident: B
	 \--FuncBody
	     \-- CompoundStatement
	          \-- [empty body]

6) Run semantic:
   - Create local symbol table for foo, etc..
   - Run semantic on CompileTimeArgList:
      - Check AliasArg for validity
      - Run semantic on AliasArg: add A to function's local symbol
        table, etc.
   - Run semantic on RuntimeArgList:
      - Check AliasArg for validity: ERROR: cannot have alias parameter
        at runtime.
      - (Add B to local symbol table)(skipped due to previous error)
   - (Run semantic on FuncBody)(skipped due to previous error)
   - (Run semantic on RetType (verify return type match, etc.))(skipped
     due to previous error)
   - (Add function to parent scope symbol table)(skipped due to previous
     error)

So, no backtracking is necessary.

Of course, it sounds like DMD's parser doesn't work this way, but that's
a limitation of DMD's parser, not an *inherent* need for backtracking.


T

-- 
I see that you JS got Bach.


More information about the Digitalmars-d mailing list