[GSoC] Header Generation for C/C++

Eduard Staniloiu edi33416 at gmail.com
Tue Jul 16 13:16:50 UTC 2019


Hi everyone,

At the end of May I've started working on my GSoC project, Header 
Generation for C/C++

Introduction
------------

In recent years, the D programming language has gained more and 
more
attention and existing C and C++ codebases are starting to 
incrementally integrate D
components.

In order to be able to use D components, a C or C++ interface to 
them must be
provided; in C and C++, this is done through header files. 
Currently, this process is entirely
manual, with the responsibility of writing a header file falling 
on shoulders of the
programmer. The larger the D portion of a codebase is, the more 
tedious the task
becomes: the best example being the DMD frontend which amounts to 
roughly ~310000
lines of code for which the C++ header files that are used by 
other backend
implementations (gdc, ldc) are manually managed. This is a 
repetitive, time consuming,
and rather boring task: this is the perfect job for a machine.

Project goal
------------

The deliverable of the project is a tool that automatically 
generates C and C++
header files from D module files. This can be achieved either by 
a library solution using
DMD as a Library, or by adding this feature in the DMD frontend 
through a compiler
switch.

The advantage of using DMD as a Library is that this wouldn’t 
increase the
complexity of compiler frontend codebase. The disadvantage will 
be that the user will be
required to install a third-party tool. Contrasting to this, the 
addition of the feature to the
frontend would result in a smoother integration with all the 
backends that use the DMD
frontend.

We have decided to go with the compiler switch approach.

One major milestone (and success marker) for the project is to 
automatically generate the
DMD frontend headers required by GDC/LDC.

Implementation strategy
-----------------------

The feature will require the implementation of a `Visitor` class 
that will traverse
the `AST` resulted after the parsing phase of the D code. For 
each top-level `Dsymbol`
(variable, function, struct, class etc.) the associated C++ 
correspondent will be written in
the header file.

The visitor will override the visiting methods of two types of 
nodes:
* Traversal nodes - these nodes simply implement the `AST` 
traversal logic:
`ModuleDeclaration`, `ScopeDeclaration`, etc.
* Output nodes - these nodes will implement the actual header 
generation logic:
`FuncDeclaration`, `StructDeclaration`, `VarDeclaration`, etc.

The header file will consist of declarations from `public extern 
(C++)` and `public extern (C)`
declarations/definitions from D modules.

Project status
--------------

I've started work [0] with the revival of DMD's PR 8591 [1], 
rebasing it and converting it into
a compiler switch.

The next step was to add a bunch of tests for the existing code, 
which revealed the following issues
* StructDeclaration:
   - align different than 1 does nothing; we should support 
align(n), where `n` in [1, 2, 4, 8, 16]
   - align(n): inside struct definition doesn’t add alignment, but 
breaks generation of default ctors
   - default ctors should be generated only if struct has no ctors
   - if a struct has ctors defined, only default ctor (S() { … }) 
should be generated to init members to default values, and the 
defined ctors must be declared
   - if a struct has a void initializer (`member = void`), the 
code segfaults
   - a struct should only define ctors if it’s `extern (C++)`

   As you can see, a bunch of the issues above are related to 
auto-generated ctor definitions.
   You might wonder "But why are there any definitions?"; the 
default ctors are there because D initializes
   member fields with a default value, while C and C++ do not, and 
this might break existing GDC/LDC behaviour.
   Ideally, we wouldn't generate any definitions, and if we can 
confirm the ctor definitions aren't needed, we'll remove them.

* ClassDeclaration:
   - align(n) does nothing. You can use align on classes in C++, 
though It is generally regarded as bad practice and should be 
avoided

* FuncDeclaration:
   - default arguments can be any valid D code, including a lambda 
function or a complex expression; we don't want to go down the 
path of generating C or C++ code, so for now default arguments 
get ignored.

* TemplateDeclaration:
   - templates imply code generation, so for now we don't support 
them

After writing the tests and understanding what are the issues, I 
got more comfortable with the codebase and I got on to the next 
(current) step: generating the DMD frontend header files from 
DMD's `*.d` frontend modules.

This took quite some time and sweat to get going: the major pain 
point here is given by templates.
There is `dmd/root/array.d` which has a templated `Array(T)` that 
is used throughout the codebase.
Since we don't support templates, we decided to keep the manual 
management of the `dmd/root/*.h` headers, but things aren't that 
simple.

The issue: while we don't explicitly pass in any of the 
`dmd/root/*.d` modules, some of them are processed during the 
semantic analysis phase, which will generate the definition of 
some `struct`s and `enum`s from `dmd/root/*.d` into the generated 
frontend header. When the generated header is used in conjunction 
with the manually managed header files from `dmd/root/*.h` a 
`struct`/`enum` re-definition error will be thrown by the 
compiler.

I kept scratching my head at how to avoid this, and in the end I 
went with explicitly ignoring anything that comes from a 
`dmd/root/*.d` module. Ideally, this special casing shouldn't be 
needed, and it should go away if we can add support for some 
simple D -> C++ templates.

So now, the current state of affairs is that the code in the PR 
[0] can link with and pass the `cxx-unittests`.

How to use it
-------------

The current PR [0] code is generating a `C++` header file out of 
a list of `.d` modules passed at compile time.

The simplest form of the CLI switch is `dmd -HC a.d b.d`

This will visit the ASTs of modules `a` and `b` and output a 
single header file at `stdout`.

By using the `-HCf=<file-name>` switch, the above result will be 
written in specified file name. Using `-HCd=<path>` will write 
the `file-name` in the specified `path`.

So, by running,
`dmd -HCf=ab.h -HCd=mypath/ a.d b.d` will write the generated 
header in `mypath/ab.h`, relative to the current directory.

If you have some spare time and curiosity I would appreciate your 
`test drive` and bug reports :)

This month
----------

I'll be working on generating the frontend headers, cleaning up 
the code and fixing issues and addressing PR comments.

Closing note
------------

I deeply apologize for this long overdue post.

Looking forward to your replies,
Edi

[0] - https://github.com/dlang/dmd/pull/9971
[1] - https://github.com/dlang/dmd/pull/8591


More information about the Digitalmars-d mailing list