Is this a bug? +goto

Wed Nov 7 20:03:47 UTC 2018

On Tuesday, 6 November 2018 at 13:53:41 UTC, MatheusBN wrote:
> On Tuesday, 6 November 2018 at 05:46:40 UTC, Jonathan M Davis 
> wrote:
>> On Monday, November 5, 2018 7:55:46 PM MST MatheusBN via 
>> Digitalmars-d-learn wrote:
>>> On Tuesday, 6 November 2018 at 01:55:04 UTC, Jonathan M Davis
>>>
>>> wrote:
>>> >> And I found a bit strange that in such code, since "x" is 
>>> >> never used, why it isn't skipped.
>>> >
>>> > It's skipped right over. The goto jumps out of the scope, 
>>> > and the line with
>>> >
>>> > int x;
>>> >
>>> > is never run. In fact, if you compile with -w or -wi, the 
>>> > compiler will give you a warning about unreachable code.
>>>
>>> That is exactly my point.
>>>
>>> Since "x" it's skipped and never used, it shouldn't just be a 
>>> warning (unreachable code) instead of an error?
>>>
>>> I'm trying to understand why/when such code could give any 
>>> problem.
>>>
>>> On the other hand if the code were:
>>>
>>> {
>>>     goto Q:
>>>     int x;
>>>
>>>     Q:
>>>     x = 10; // <- Now you are accessing an uninitialized 
>>> variable.
>>> }
>>>
>>> Then I think an error would be ok.
>>
>> D tries to _very_ little with code flow analysis, because once 
>> you start having to do much with it, odds are that the 
>> compiler implementation is going to get it wrong. As such, any 
>> feature that involves code flow analysis in D tends to be 
>> _very_ simple. So, D avoids the issue here by saying that you 
>> cannot skip the initialization of a variable with goto. The 
>> compiler is not going to do the complicated logic of keeping 
>> track of where you access the variable in relation to the 
>> goto. That's exactly the sort of thing that might be obvious 
>> in the simple case but is highly likely to be buggy in more 
>> complex code. Code such as
>>
>> {
>>     goto Q;
>>     int x;
>> }
>> Q:
>>
>> or
>>
>> {
>>     if(foo)
>>         goto Q;
>>     int x;
>> }
>> Q:
>>
>>
>> is fine, because the compiler can trivially see that it is 
>> impossible for x to be used after it's been skipped, whereas 
>> with something like
>>
>> goto Q;
>> int x;
>> Q:
>>
>> the compiler has to do much more complicated analysis of what 
>> the code is doing in order to determine that, and when the 
>> code isn't trivial, that can get _really_ complicated.
>>
>> You could argue that it would be nicer if the language 
>> required that the compiler be smarter about it, but by having 
>> the compiler be stupid, it reduces the risk of compiler bugs, 
>> and most people would consider code doing much with gotos like 
>> this to be poor code anyway. Most of the cases where goto is 
>> reasonable tend to be using goto from inside braces already, 
>> because it tends to be used as a way to more efficiently exit 
>> deeply nested code. And with D's labeled break and continue, 
>> the need for using goto outside of switch statements also 
>> tends to be lower than it is in C/C++.
>>
>> - Jonathan M Davis
>
> It's clear now about this decision and by the way thanks for 
> replying all my doubts.
>
> MatheusBN.

Don't let their psychobabble fool you. They are wrong and you 
were right from the start.

There is no initialization of the variable, or, if there 
is(because it's "on the tack, which is "initialized" at the start 
of the function"), the variable is still never used and that is 
the whole problem.

What you will find with some of these guys is they start with the 
assumption that everything D does is correct then they try to 
disprove anything that goes against it by coming up with reasons 
that explain why D does it the way it does. It is circular 
reasoning and invalid. Each step they come up with some new 
explanation when you pick holes in their previous ones.

Eventually it's either "It's because D is not designed to do 
that" or "write an enhancement yourself" type of answer.

The fact is simple: Who ever implemented the goto statement did 
not create code to handle this case and chose the easiest route 
which is to error out. This was either oversight or "laziness".

It's really simple as that. Not once has anyone proven that the 
semantics are illogical, which is what it would require for the 
compiler to be absolutely correct in it's error.

In this case, they are simple wrong because it requires no flow 
analysis or any complex logic to determine. It's not because C is 
stupid and is unsafe, it's unreachable, etc...

The compiler simply knows what line and scope a variable is 
initialized on(since it can determine if a variable is used for 
initialization, which is a logic error) and it simply has to 
determine if the goto escapes the scope before using any 
initialized variable.

It can do this easily but the logic was not added.

Case A:
{
    if (true) goto X;
    int x;
}
X:

Case B:
{
    if (true) goto X;
    {
       int x;
    }
}
X:

These two cases are EXACTLY the same semantically. It's like 
writing A + B and (A + B).

What the extra scope does though is create a new scope in the 
compiler AST and this separates the goto logic, which is properly 
implemented to handle that case.

The fact that one produces one error and the other is valid 
proves that the compiler is incomplete. Adding scopes does not 
change semantics no different than adding parenthesis(which is 
just scope). ((((((3)))))) is the same as 3. (obviously not all 
scopes can be eliminated in all cases, but this isn't one of 
those cases)

And, so, the real answer is simply the compiler does not test 
this case. My point with the previous post was to point it out... 
but as you see, a lot of the fanboys come in and simply defend 
what D does as if it is the most valid way from the get go. This 
is their mind set. They reason from their conclusions. I've seen 
them do it quite often. I'm not sure what the motivations are. If 
they don't understand the problem(Sometimes simple is very 
confusing for some) or if they want to obfuscate or what.

The idea for any sane person would be to check and see if the 
code has a semantically logical meaning first. In this case it 
does. Goto is a common control flow feature and sometimes 
necessary to greatly simplify certain problems(since D does not 
have the ability to escape nested scopes such as return3, which 
returns from 3 nested scopes in).

If one can transform logically the "offending" code in to a 
semantically equivalent piece of code(this is known as 
mathematical transformation, such as rewriting a mathematical 
expression using logically valid rules) that involves no real 
changes(such as adding scopes), and one fails and the other 
doesn't, it means the compiler has a bug.

It's like when people drop parenthesis: (3 + 4)*2 =?= 3 + 4*2.

It's illogical. If the compiler did this transformation it would 
produce invalid results and it would be impossible to reason 
about code.

If the compiler gives errors for one of two identical 
mathematical tree's(remember, programs are just mathematical 
formulas, just really complex, but AST's abstractly the same) 
then the compiler has a problem.

It's like saying that (3 + 4)*2 is invalid but 3*2 + 4*2 is valid.

It means the compiler did not implement the distributive property.

People that don't know what they are talking about will then try 
to justify why one works and the other doesn't using some 
circular or invalid logic rather than actually understanding what 
is going on. It is damn near impossible to reason with these 
people because they always start with their conclusion and try to 
make all the pieces fit that conclusion. Sometimes they 
eventually come around to a logical conclusion but only they've 
created a rats nest of reasons and cannot proceed any further but 
to say, basically, "it is what it is".

The problem is they still never understand what the actual 
problem is... (because of the rats nest they have just made 
themselves even more confused)

The problem with the goto is clearly stated and to counter it as 
being illogical one must simply prove one example where it would 
result in invalid logic(not crapping out the compiler... the 
compiler is not perfect and so will have bugs and errors in it. 
The goal is not to justify those bugs and errors but to fix them 
so the compiler does a better job and is more logically 
expressive).

e.g., two cases (the `Case` term is not part of a switch in D, 
just use to denote the two possible scenarios)

Case A:

{
    if (true) goto X;
    int x;
}
X:

Case B:

{
    if (true) goto X;
    {
       int x;
    }
}
X:

Why is case A any different than case B(in general, the above is 
an example, the compiler might optimize things, which we don't 
want to do since optimizations are secondary effects that are not 
as important as logical consistency)? We are simply talking about 
the pure semantics of programming. It doesn't really matter what 
language we use to express it, This is not a problem in D but a 
problem in programming languages. The question is simply: Are the 
two case semantically equivalent? (e.g., does (3) = 3? (5) = 5, 
(x) = x, (((((x+y*3))))) = x+y*3, etc )

Since we are not thinking of any specific compiler(although we 
have to use the syntax and language grammar of D since ultimately 
it has to do with D and it has to be expressed in some language, 
so D is the obvious choice) we can't use circular reasoning(e.g., 
D does it this way and D is right so...).

Now, the fact is, these are identical statements semantically... 
trivially so. It really can't get any simpler. Doesn't matter 
what D does. If D can't see that then D is incomplete.

Now, since we ultimately have to translate in to D and compilers 
do strange things, it is possible that *in D* they are not 
identical. E.g., if D inserted initialization of locals at the 
start of scope and de-initializers at the end of scope, they 
would not be the same.

which one could express as:

Case A:

int x;
{
    if (true) goto X;
    //int x;
}
~x;
X:

Case B:

{
    if (true) goto X;
    int x;
    {
       //int x;
    }
    ~x;
}
X:

Which, it is clear that x is initialized before the goto in case 
A and after in case B. This could cause problems(chances are if D 
did something like this then it would result in invalid programs 
and compilers bugs at some point).

Sometimes though, because compilers are very complex, it is 
necessary to prevent certain cases from occurring so certain 
other semantics can be used. Sometimes compilers simply crap out 
precisely because that is the easiest thing to do. Of course, if 
this is done, someone should know about it and be able to explain 
why the compiler chose to do this rather than the most logical 
thing.

Don't let people bludgeon you in to submission. Truth and logic 
is not dictatorial but absolute.