On C/C++ undefined behaviours (on the term "undefined behaviours")

Bruno Medeiros brunodomedeiros+spam at com.gmail
Thu Oct 7 03:38:31 PDT 2010


On 06/10/2010 16:59, Stanislav Blinov wrote:
> 06.10.2010 19:34, Bruno Medeiros пишет:
>> On 20/08/2010 17:38, bearophile wrote:
>>> Three good blog posts about undefined behaviour in C and C++:
>>> http://blog.regehr.org/archives/213
>>> http://blog.regehr.org/archives/226
>>> http://blog.regehr.org/archives/232
>>>
>>> In those posts (and elsewhere) the expert author gives several good
>>> bites to the ass of most compiler writers.
>>>
>>> Among other things in those three posts he talks about two programs as:
>>>
>>> import std.c.stdio: printf;
>>> void main() {
>>> printf("%d\n", -int.min);
>>> }
>>>
>>> import std.stdio: writeln;
>>> void main() {
>>> enum int N = (1L).sizeof * 8;
>>> auto max = (1L<< (N - 1)) - 1;
>>> writeln(max);
>>> }
>>>
>>> I believe that D can't be considered a step forward in system
>>> language programming until it gives a much more serious consideration
>>> for integer-related overflows (and integer-related undefined behaviour).
>>>
>>> The good thing is that Java is a living example that even if you
>>> remove most integer-related undefined behaviours your Java code is
>>> still able to run as fast as C and sometimes faster (on normal
>>> desktops).
>>>
>>> Bye,
>>> bearophile
>>
>> Interesting post.
>>
>> There is a important related issue here. It should be noted that, even
>> though the article and the C FAQ say:
>> "
>> The C FAQ defines “undefined behavior” like this:
>>
>> Anything at all can happen; the Standard imposes no requirements. The
>> program may fail to compile, or it may execute incorrectly (either
>> crashing or silently generating incorrect results), or it may
>> fortuitously do exactly what the programmer intended.
>> "
>> this definition of "undefined behavior" is not used consistently by C
>> programmers, or even by more official sources such as books, or even
>> the C standards. A trivial example:
>>
>> foo(printf("Hello"), printf("World"));
>>
>> Since the evaluation order of arguments in not defined in C, these two
>> printfs can be executed in any of the two possible orders. The
>> behavior is not specified, it is up to the implementation, to the
>> compiler switches, etc..
>> Many C programmers would say that such code has/is/produces undefined
>> behavior, however, that is clearly not “undefined behavior” as per the
>> definition above. A correct compiler cannot cause the code above to
>> execute incorrectly, crash, calculate PI, format you hard disk,
>> whatever, like on the other cases. It has to do everything it is
>> supposed to do, and the only "undefined" thing is the order of
>> evaluation, but the code is not "invalid".
>>
>> I don't like this term "undefined behavior". It is an unfortunate C
>> legacy that leads to unnecessary confusion and misunderstanding, not
>> just in conversation, but often in coding as well. It would not be so
>> bad if the programmers had the distinction clear at least in their
>> minds, or in the context of their discussion. But that is often not
>> the case.
>>
>> I've called before for this term to be avoided in D vocabulary, mainly
>> because Walter often (ab)used the term as per the usual C legacy.
>> The “undefined behavior” as per the C FAQ should be called something
>> else, like "invalid behavior". Code that when given valid inputs
>> causes invalid behavior should be called invalid code.
>> (BTW, this maps directly to the concept of contract violations.)
>>
>>
> I always thought that the term itself came from language specification,
> i.e. the paper that *defines* behavior of the language and states that
> there are cases when behavior is not defined (i.e. in terms of the
> specification). From this point of view the term is understandable and,
> uh, valid. It's just that it got abused with time, especially this abuse
> is notable in discussions (e.g. "Don't do that, undefined behavior will
> result": one can sit and guess how exactly she will get something that
> is not defined).
>

"the term itself came from language specification" -> yes that is 
correct. I read K&R's "The C Programming Language", second edition, and 
the term comes from there, at least as applied to C. But they don't 
define or use the term as the C FAQ above, or at least not as 
explicitly, if I recall correctly (im 98% sure I am). They just describe 
each particular language rule individually and tell you what to expect 
if you break the rule. Often they will say something like "this will 
cause undefined behavior" and it is clear that is is illegal. But other 
times they would say something like "X is undefined", where X could be 
"the order of execution", "the results of Y", "the contents of variable 
Z", and it is not clear whether that meant the program could exhibit 
undefined behavior or not. (or in other words if that was illegal or not)

I don't know if this concept or related ones have actually been better 
formalized in newer revisions of the C standard.

> I don't think that "invalid behavior" covers that sense: it means that
> implementation should actually do something to make code perform
> 'invalid' things (what should be considered invalid, by the way?),
> rather than have the possibility to best adapt the behavior to system
> (e.g. segfault) or some error handling mechanism (e.g. throw an exception).

In this case I don't know for sure what the best alternative term is, I 
just want to avoid confusion with "invalid behavior". I want to know 
when a program execution may actually be invalidated (crash, memory 
corruption, etc.), versus when it is just some particular and *isolated* 
aspect of behavior that is simply "undefined", but program execution is 
not invalidated. In other words, if it is illegal or not.


-- 
Bruno Medeiros - Software Engineer


More information about the Digitalmars-d mailing list