Adding a D backend to GNU Bison

Wed Jan 16 23:27:02 UTC 2019

On Tue, Jan 15, 2019 at 03:13:44PM +0000, Eduard Staniloiu via Digitalmars-d wrote:
[...]
> I'm posting this as a followup to the positive feedback that Andrei's
> Bison related post(
> https://forum.dlang.org/thread/1c3d8e77-ce4c-6310-0afd-e6518728299f@erdani.org)
> has received.
> 
> Akim Demaille has started "turning the wheels" towards adding a D
> backend to GNU Bison.

Great!

> There currently is a skeleton for D on the Bison master
> (https://savannah.gnu.org/git/?group=bison) that you can use to check
> out the backend. A good starting point to explore this feature is to
> go into the .../share/doc/bison/examples/d directory and to run "make"
> there.
> 
> There is no documentation and Akim doesn't have experience with the D
> programming language, that is where we, the D community, can lend a
> helping hand.
> I'm posting this to ask for your help in getting the D backend feature
> into Bison.

I glanced briefly at the various D-related notes, and took a good look
at the generated calc.d in the examples/d directory.  Here are some
comments:

- I understand that the current D codegen is mainly based on the
  existing Java backend, so unsurprisingly quite a few places shows
  signs of being very Java-like rather than D-like.  Hopefully, with
  some work, we can get it to emit more idiomatic D. :-)

- The first question I have is how much the Bison API depends on the
  lexer being swappable at runtime, i.e., via the Lexer interface.  I'm
  having a hard time imagining that there will be many use cases where
  you'd like to swap lexers with the same parser at runtime, so I'm
  thinking the parser should simply take the lexer type as a template
  argument, with sig constraints ensuring that whatever type the user
  passes in implements the necessary methods for the parser to work.
  This lets us bind the lexer to the parser at compile-time, and elide
  the vtable indirection (it can still be done if the user passes in a
  class).

- Along a similar vein, I'm wondering if the generated parser ought to
  be a class at all, or is the inheritability of the parser a key Bison
  feature?  Also, are language-specific directives supported /
  encouraged?  If so, it might be worthwhile to let the user choose
  whether to use a struct/template API vs. an OO class-based API.

- On a more high-level note, I'm wondering how flexible the API of the
  parser can be.  The main thought behind this is that given enough
  flexibility, we may be able to target, e.g., @nogc, @safe, pure, etc..
  With @safe probably a pretty important target, if it's possible to do
  so.  While this depends of course on the exact code the user puts into
  the .y file, a worthy goal is to make the emitted D code @safe (pure,
  etc.) by default unless the user writes non- at safe code in the .y file.

- How flexible can the lexer API be?  For example, currently
  lexer.yyerror takes a string argument, which requires using std.format
  in various places.  If permissible, I'd like to have yyerror take a
  generic input range instead, so that we can avoid the inherent memory
  allocation of std.format (e.g., if we wish to target @nogc).

- Also, is it possible to use exceptions instead of yyerror()?  Or would
  that deviate too far from Bison's design?

- On a more general note, I'd like to make the parser/lexer APIs
  range-based as much as possible, esp. when it comes to
  string-handling.  But I'm just not sure how much the APIs are expected
  to conform to the analogous C/C++/Java APIs.

- I wonder if YYSemanticType could use std.variant somehow instead of a
  raw union, which would probably force the parser to be @system.

- Can Bison handle UTF-8 lexer/parser rules?  D uses UTF-8 by default,
  and it would be nice to leverage this support instead of manually
  iterating over bytes, as is done in a few places.

- Some minor points that should be easy to fix:

   - The YYACCEPT, YYABORT, etc., symbols really should be declared as
     enums rather than static ints.

   - D does support the #line directive.  So these should be emitted as
     they are in C/C++. (I noticed they currently only appear as
     comments.)

   - YYStack needs to be fixed to avoid the reallocate-on-every-push
     problem on arrays. A common beginner's mistake.  Also, if we're
     going to target @nogc (not 100% sure about that right now), we may
     have to forego built-in arrays altogether.

[...]
> Akim is going to provide assistance with the process, but he is not to
> be expected to carry this task on his own.
[...]

Dumb question: If I wanted to contribute some commits, do I have to sign
up on savannah.gnu.org?  What's the procedure for submitting pull
requests?  (Sorry, I glanced over the README's and the FAQ at
savannah.gnu.org but didn't find a clear answer.)

T

-- 
May you live all the days of your life. -- Jonathan Swift