How compiler detects forward reference errors
Illuminati via Digitalmars-d-learn
digitalmars-d-learn at puremagic.com
Sun Sep 4 16:24:49 PDT 2016
On Sunday, 4 September 2016 at 19:15:15 UTC, Igor wrote:
> On Saturday, 3 September 2016 at 14:13:27 UTC, Lodovico
> Giaretta wrote:
>> On Saturday, 3 September 2016 at 14:06:06 UTC, Igor wrote:
>>> Can anyone explain in plain English how does compiler process
>>> and detect a "test.d(6) Error: forward reference of variable
>>> a" in following code:
>>>
>>> import std.stdio;
>>>
>>> enum a = 1 + b;
>>> enum d = 5 + a; // No error here
>>> enum b = 12 + c;
>>> enum c = 10 + a; // error here
>>>
>>> void main()
>>> {
>>> writeln("Hello World!", b);
>>> }
>>
>> a needs b to be initialized. So b must be initialized before
>> a. Let's write this b->a. Now b needs c. So c->b. c needs a,
>> so a->c. If we sum everything, we have that a->c->b->a. This
>> mean that to initialize a we need b, to initialize b we need
>> c, but to initialize c we need a. So to initialize a we need
>> a, which is not possible. We need a before having initialized
>> it.
>>
>> On the other hand, a->d is not a problem, as d can be
>> initialized after a.
>
> So, you are saying compiler is keeping a kind of linked list of
> dependencies and then checks if any of those lists are
> circular? But how exactly would that list be structured since
> one expression can have multiple dependencies, like:
>
> enum a = b + c + d + e;
> enum b = 10 + c;
> enum c = d + e + a;
> ...
It is not complicated. Each expression is incomplete. The
compiler can easily keep track if a variable is completely
defined or incompletely defined. If it is completely defined it
must be able to be evaluated at CT. It can be evaluated because
it only depends on things that are evaluatable(such as numbers
and mathematical operations, etc).
So, any expression that depends on something else, that something
else must be completely evaluatable for the dependent expression
to be completely evaluatable.
That is "completely evaluatable" is transitive. As the compiler
processes expressions, it can check just have a flag to mark
variable that "hold" those expressions(or were assigned) as
either being evaluated or not. If it is, it, say, is marked. Then
other other expressions that depend on the variable(s) require
all it's dependencies to have been evaluated.
For forward references, it is not much difference between that
and an undefined variable.... except the compiler "looks ahead"
and knows that the variables were declared after they were used.
Not a lot of difference between the two except the compiler tries
to be a little smarter in one case.
For the compiler to actually finish it's job, every single
expression, statement, etc must be interpreted properly. If it is
not, it can't compute the result. The "program" is invalid. There
is effectively a one to one correspondence between the languages
grammar and the "machine code" that the compiler spits out. If
there is not, then the compiler simply cannot produce a result(it
does not know how or understand what the programmer wants).
It is very complex, in many ways, but the rules are simple and
well defined(they must be to be able to produce consistent and
valid results). It is like physics or any scientific thing. The
are "laws" of behavior and no matter how complex something
behaves in the physical world, it must behave according to those
laws. There are laws in a programming language(the grammar) and
every program must abide by them or the program is invalid.
Using a variable before it is defined does not make sense. It is
illogical, unmathematical, unscientific, and meaningless.
One could argue that something like
enum a = b + c;
enum b = 10 + c;
enum c = 4;
could be valid, and this is true, if we work from the bottom up.
But that is just grammatical transformations. We generally work
top down and all grammars I know of use top down. One can't use
both because it would produce ambiguous results and is rather
meaningless.
But
enum a = b;
enum b = a;
is meaningless in any interpretation and doesn't work no matter
how one defines things, in general. Of course, one can build in
any type of craziness in to a grammar and make such things
interpretable but generally, while simple cases work, more
complex cases become ambiguous and meaningless, hard to reason
about, etc.
So, a compiler creates a set of rules and a programmer(should be
programmar, but...) learns those rules and tries to express what
he wants done by the program in terms of those rules. The
compiler then takes the "code" from the programmer and checks to
see if it fits the rules of the grammar and if it does, it
converts the code in to a program to execute the "specific" rules
the programmer coded.
Now, the compiler can decide how it wants to process the code.
This is where the "forward reference" comes in for D. Not all
compilers do this. Some will just process the first line and say
"Hey, there is no b in my symbol table. FAIL!". D goes a step
further and doesn't stop immediately, it processes the next
line.. etc. But all that is implementation details that depends
on the specific compilers.
More information about the Digitalmars-d-learn
mailing list