How compiler detects forward reference errors

Sun Sep 4 16:24:49 PDT 2016

On Sunday, 4 September 2016 at 19:15:15 UTC, Igor wrote:
> On Saturday, 3 September 2016 at 14:13:27 UTC, Lodovico 
> Giaretta wrote:
>> On Saturday, 3 September 2016 at 14:06:06 UTC, Igor wrote:
>>> Can anyone explain in plain English how does compiler process 
>>> and detect a "test.d(6) Error: forward reference of variable 
>>> a" in following code:
>>>
>>> import std.stdio;
>>>
>>> enum a = 1 + b;
>>> enum d = 5 + a; // No error here
>>> enum b = 12 + c;
>>> enum c = 10 + a; // error here
>>>
>>> void main()
>>> {
>>>     writeln("Hello World!", b);
>>> }
>>
>> a needs b to be initialized. So b must be initialized before 
>> a. Let's write this b->a. Now b needs c. So c->b. c needs a, 
>> so a->c. If we sum everything, we have that a->c->b->a. This 
>> mean that to initialize a we need b, to initialize b we need 
>> c, but to initialize c we need a. So to initialize a we need 
>> a, which is not possible. We need a before having initialized 
>> it.
>>
>> On the other hand, a->d is not a problem, as d can be 
>> initialized after a.
>
> So, you are saying compiler is keeping a kind of linked list of 
> dependencies and then checks if any of those lists are 
> circular? But how exactly would that list be structured since 
> one expression can have multiple dependencies, like:
>
> enum a = b + c + d + e;
> enum b = 10 + c;
> enum c = d + e + a;
> ...

It is not complicated. Each expression is incomplete. The 
compiler can easily keep track if a variable is completely 
defined or incompletely defined. If it is completely defined it 
must be able to be evaluated at CT. It can be evaluated because 
it only depends on things that are evaluatable(such as numbers 
and mathematical operations, etc).

So, any expression that depends on something else, that something 
else must be completely evaluatable for the dependent expression 
to be completely evaluatable.

That is "completely evaluatable" is transitive. As the compiler 
processes expressions, it can check just have a flag to mark 
variable that "hold" those expressions(or were assigned) as 
either being evaluated or not. If it is, it, say, is marked. Then 
other other expressions that depend on the variable(s) require 
all it's dependencies to have been evaluated.

For forward references, it is not much difference between that 
and an undefined variable.... except the compiler "looks ahead" 
and knows that the variables were declared after they were used. 
Not a lot of difference between the two except the compiler tries 
to be a little smarter in one case.

For the compiler to actually finish it's job, every single 
expression, statement, etc must be interpreted properly. If it is 
not, it can't compute the result. The "program" is invalid. There 
is effectively a one to one correspondence between the languages 
grammar and the "machine code" that the compiler spits out. If 
there is not, then the compiler simply cannot produce a result(it 
does not know how or understand what the programmer wants).

It is very complex, in many ways, but the rules are simple and 
well defined(they must be to be able to produce consistent and 
valid results). It is like physics or any scientific thing. The 
are "laws" of behavior and no matter how complex something 
behaves in the physical world, it must behave according to those 
laws. There are laws in a programming language(the grammar) and 
every program must abide by them or the program is invalid.

Using a variable before it is defined does not make sense. It is 
illogical, unmathematical, unscientific, and meaningless.

One could argue that something like

enum a = b + c;
enum b = 10 + c;
enum c = 4;

could be valid, and this is true, if we work from the bottom up. 
But that is just grammatical transformations. We generally work 
top down and all grammars I know of use top down. One can't use 
both because it would produce ambiguous results and is rather 
meaningless.

But

enum a = b;
enum b = a;

is meaningless in any interpretation and doesn't work no matter 
how one defines things, in general. Of course, one can build in 
any type of craziness in to a grammar and make such things 
interpretable but generally, while simple cases work, more 
complex cases become ambiguous and meaningless, hard to reason 
about, etc.

So, a compiler creates a set of rules and a programmer(should be 
programmar, but...) learns those rules and tries to express what 
he wants done by the program in terms of those rules. The 
compiler then takes the "code" from the programmer and checks to 
see if it fits the rules of the grammar and if it does, it 
converts the code in to a program to execute the "specific" rules 
the programmer coded.

Now, the compiler can decide how it wants to process the code. 
This is where the "forward reference" comes in for D. Not all 
compilers do this. Some will just process the first line and say 
"Hey, there is no b in my symbol table. FAIL!". D goes a step 
further and doesn't stop immediately, it processes the next 
line.. etc.  But all that is implementation details that depends 
on the specific compilers.