Solving the spurious forward/cyclic reference errors in DMD

Fri Jul 21 06:31:55 PDT 2017

On 17.07.2017 03:16, Elie Morisse wrote:
> Timon, any update on this?

Not really. I did not have much time to work in this in the past year. 
(Also, it is not very convenient to work with DMD 2.060.)

> What are the insights you gained with your 
> frontend?
> ...

What I have implemented works well but still has a few limitations, e.g. 
the following code does not compile:

enum x = "enum xx = q{int y = 0;};";

struct SS{
     mixin(xx);
     mixin(x); // error
}

I believe this should be allowed. An easy but slightly ugly solution 
would be to sequentialize analysis of string mixins even in scopes that 
have no ordering. (Then the above would still not compile, but it would 
compile once we change the order of the two string mixins.)

The idea behind the algorithm used by my frontend is to analyze all AST 
nodes concurrently, blocking them as soon as they need information that 
is not yet available. This continues until the analysis has finished or 
all AST nodes are blocked. In that case, the compiler examines the top 
strongly connected components of the dependency graph (i.e. the smallest 
set of nodes that everything else depends on). Some kinds of analysis 
dependencies can be discharged by assuming a natural default result. 
(For example, we can assume by default that a certain member is not 
present, or that a more specialized overload does not exist, etc.) The 
assumptions are made simultaneously for all nodes in the top strongly 
connected components and analysis proceeds. The compiler prints an error 
if any assumptions end up being violated.

> I recently reported two cases without a simple fix:
> 
> https://issues.dlang.org/show_bug.cgi?id=17656
> https://issues.dlang.org/show_bug.cgi?id=17194#c1
> ...

Those are easily handled by my algorithm.

> and have seen a lot more referencing errors with Calypso, especially 
> when this gets enabled: 
> https://github.com/Syniurge/Calypso/commit/1e1ae319e32120bd9ef0009716ddabed92f69ac2 
> 
> 
> Calypso makes its mapped C++ symbols go through the same importAll -> 
> semantic1,2,3 route that D symbols take. Ultimately this is mostly 
> useless work that should be skipped, the reason it currently works this 
> way being that I wasn't familiar yet with the DMD source code when I 
> started. But what this hard and ungrateful work has also been doing (and 
> many large libraries are blocked by this) is exposing a seemingly 
> infinite number of bogus forward/circular/misc referencing DMD errors.
> Those errors boil down to semantic calls getting triggered at the wrong 
> time, on symbols that the caller doesn't really depend upon.
> ...

I don't think that in the end there will be a better solution than 
explicitly tracking what depends on what.

> Because most of the time, the semantic() call on the LHS of DotXXXExp, 
> inside AggregateDeclaration.determineSize, etc. is there in case there are:
>   - mixins to expand

Also, conditional compilation like static if, I guess.

>   - attributes whose members have to be added to the parent symtab
>   - if LHS is a template to instantiate
> 
> These are (AFAIK) the only cases where the symtab of the LHS or the 
> aggregate may get altered, and if I understand correctly that's what the 
> semantic call is checking before searching for the RHS or determining 
> the aggregate fields and then its size.
> ...

I handle more things, for example inheritance, or adding members to 
aggregates whose instances have already been used in CTFE etc., but 
those are probably not a big deal for your use case.

> So would splitting semantic() into determineMembers() preceding the rest 
> of semantic() be worth exploring?

Probably yes, but it will at most be a partial solution.

> The thing is, this would help in most 
> cases but I can imagine scenarios where simply splitting may not be 
> enough. Example:
> 
> enum E { OOO = S.UUU }
> 
> import std.conv;
> string genMember1() { return "enum G8H9 = " ~ (cast(int)E.OOO).to!string; }
> string genMember2() { return "enum UUU = 1;"; }
> 
> struct S {
>      mixin(genMember1());
>      mixin(genMember2());
> }

genMember1() is missing a semicolon in the code it generates, but 
otherwise this works with what I have.

> 
> We'll have S.determineMembers -> E.OOO.semantic -> S.determineMembers, 
> and although in this case the value of OOO may be interpreted to 1, at 
> this point the compiler can't easily know whether mixins will generate 
> zero, one or more UUU members or not. To attenuate the problem 
> determineMembers() could be made be callable multiple times 
> (recursively), each time starting from where the previous (on-going) 
> call left off,

How do you make sure you don't go on indefinitely and overflow the stack?

> so in this particular case the second S.determineMembers 
> call would expand the second mixin to enum UUU = 1. But then how does 
> the compiler knows and react if genMember1 generate a new UUU member? Ok 
> a second UUU enum will error, but what if UUU was a function and 
> genMember1() generates a more specialized overload of UUU? I.e:
> 
> enum E { OOO = S.UUU(1) }
> 
> import std.conv;
> string genMember1() { return "static int UUU(int n) { return n; }; enum 
> G8H9 = " ~ (cast(int)E.OOO).to!string; }
> string genMember2() { return "static int UUU(int n, int a = 5) { return 
> n + 5; }"; }
> 
> struct S {
>      mixin(genMember1());
>      mixin(genMember2());
> }
>  > At this point well it's getting a bit contrived, so maybe it's not
> really worth finding a solution to make this compile

My opinion is that the following is good enough:

enum E { OOO = S.UUU(1) }

string genMember1() { return "static int UUU(int n) { return n; };"; }
string genMember2() { return "enum G8H9 = " ~ 
(cast(int)E.OOO).to!string~";"; }
string genMember3() { return "static int UUU(int n, int a = 5) { return 
n + 5; }"; }

struct S {
     mixin(genMember1());
     mixin(genMember2());
     mixin(genMember3());
}

It is possible to make many cases like yours (that have spurious 
dependencies through string values) work too, but I don't think it is 
remotely worth the effort. One would need to e.g. symbolically compute 
the mixed in string and partially evaluate the parser to extract some 
declarations before we know the entire mixed in string.

> (but ideally the 
> compiler should still warn the user).
> ...

My frontend says:
<mixin at test.d:7>:1:1: error: introduction of new overload is invalid
static int UUU(int n) { return n; }; enum G8H9 = 6;
^──────────────────────────────────
test.d:1:16: note: overload set was sealed here
enum E { OOO = S.UUU(1) }
                ^────

In general, the frontend is designed such that compiling the original 
code with all mixins and conditional compilation expanded always gives 
the same results as compiling the original code. (And if it is not able 
to achieve this, the frontend prints an error.)

> Should I try splitting semantic() and make a PR? It might be a lot of 
> work, so I'd like to know if this makes sense first.

It's hard to tell for me, as I am not extremely familiar with DMD's 
internals. If this solves your issues with Calypso, I'd say, go for it. 
(Also, who knows, you might fix enough forward reference issues to make 
my frontend compile with the latest DMD again.)