assert semantic change proposal

Tue Aug 5 11:55:58 PDT 2014

On Tue, Aug 05, 2014 at 11:18:46AM -0700, Jeremy Powers via Digitalmars-d wrote:
> This has already been stated by others, but I just wanted to pile on -
> I agree with Walter's definition of assert.
> 
> 2. Semantic change.
> > The proposal changes the meaning of assert(), which will result in
> > breaking existing code.  Regardless of philosophizing about whether
> > or not the code was "already broken" according to some definition of
> > assert, the fact is that shipping programs that worked perfectly
> > well before may no longer work after this change.
> 
> Disagree.
> Assert (as I always understood it) means 'this must be true, or my
> program is broken.'  In -release builds the explicit explosion on a
> triggered assert is skipped, but any code after a non-true assert is,
> by definition, broken.  And by broken I mean the fundamental
> constraints of the program are violated, and so all bets are off on it
> working properly.  A shipping program that 'worked perfectly well' but
> has unfulfilled asserts is broken - either the asserts are not
> actually true constraints, or the broken path just hasn't been hit
> yet.

Exactly. I think part of the problem is that people have been using
assert with the wrong meaning. In my mind, 'assert(x)' doesn't mean
"abort if x is false in debug mode but silently ignore in release mode",
as some people apparently think it means. To me, it means "at this point
in the program, x is true".  It's that simple.

Now if it turns out that x actually *isn't* true, then you have a
contradiction in your program logic, and therefore, by definition, your
program is invalid, which means any subsequent behaviour is undefined.
If you start with an axiomatic system where the axioms contain a
contradiction, then any results you derive from the system will be
meaningless, since a contradiction vacuously proves everything.
Similarly, any program behaviour that follows a false assertion is
undefined, because one of the "axioms" (i.e., assertions) introduces a
contradiction to the program logic.

> Looking at the 'breaking' example:
> 
> assert(x!=1);
> if (x==1) {
>  ...
> }
> 
> If the if is optimized out, this will change from existing behaviour.
> But it is also obviously (to me at least) broken code already.  The
> assert says that x cannot be 1 at this point in the program, if it
> ever is then there is an error in the program.... and then it
> continues as if the program were still valid.  If x could be one, then
> the assert is invalid here.  And this code will already behave
> differently between -release and non-release builds, which is another
> kind of broken.

Which is what Walter has been saying: the code is *already* broken, and
is invalid by definition, so it makes no difference what the optimizer
does or doesn't do. If your program has an array overrun bug that writes
garbage to an unrelated variable, then you can't blame the optimizer for
producing a program where the unrelated variable acquires a different
garbage value from before. The optimizer only guarantees (in theory)
consistent program behaviour if the program is valid to begin with. If
the program is invalid, all bets are off as to what its "optimized"
version does.

> > 3a. An alternate statement of the proposal is literally "in release
> > mode, assert expressions introduce undefined behavior into your code
> > in if the expression is false".
> 
> This statement seems fundamentally true to me of asserts already,
> regardless of whether they are used for optimizations.  If your assert
> fails, and you have turned off 'blow up on assert' then your program
> is in an undefined state.  It is not that the assert introduces the
> undefined behaviour, it is that the assert makes plain an expectation
> of the code and if that expectation is false the code will have
> undefined behaviour.

I agree.

> > 3b. Since assert is such a widely used feature (with the original
> > semantics, "more asserts never hurt"), the proposal will inject a
> > massive amount of undefined behavior into existing code bases,
> > greatly increasing the probability of experiencing problems related
> > to undefined behavior.
> >
> 
> I actually disagree with the 'more asserts never hurt' statement.
> Exactly because asserts get compiled out in release builds, I do not
> find them very useful/desirable.  If they worked as optimization hints
> I might actually use them more.
> 
> And there will be no injection of undefined behaviour - the undefined
> behaviour is already there if the asserted constraints are not valid.

And if people are using asserts in ways that are different from what
it's intended to be (expressions that must be true if the program logic
has been correctly implemented), then their programs are already invalid
by definition. Why should it be the compiler's responsibility to
guarantee consistent behaviour of invalid code?

> > Maybe if the yea side was consulted, they might easily agree to an
> > alternative way of achieving the improved optimization goal, such as
> > creating a new function that has the proposed semantics.
> >
> 
> Prior to this (incredibly long) discussion, I was not aware people had
> a different interpretation of assert.  To me, this 'new semantics' is
> precisely what I always thought assert was, and the proposal is just
> leveraging it for some additional optimizations.  So from my
> standpoint, adding a new function would make sense to support this
> 'existing' behaviour that others seem to rely on - assert is fine as
> is, if the definition of 'is' is what I think it is.

Yes, the people using assert as a kind of "check in debug mode but
ignore in release mode" should really be using something else instead,
since that's not what assert means. I'm honestly astounded that people
would actually use assert as some kind of non-release-mode-check instead
of the statement of truth that it was meant to be.

Furthermore, I think Walter's idea to use asserts as a source of
optimizer hints is a very powerful concept that may turn out to be a
revolutionary feature in D. It could very well develop into the answer
to my long search for a way of declaring identities in user-defined
types that allow high-level optimizations by the optimizer, thus
allowing user-defined types to be on par with built-in types in
optimizability. Currently, the compiler is able to optimize x+x+x+x into
4*x if x is an int, for example, but it can't if x is a user-defined
type (e.g. BigInt), because it can't know if opBinary was defined in a
way that obeys this identity. But if we can assert that this holds for
the user-defined type, e.g., BigInt, then the compiler can make use of
that axiom to perform such an optimization.  This would then allow code
to be written in more human-readable forms, and still maintain optimal
performance, even where user-defined types are involved.

While manually-written code generally doesn't need this kind of
optimization (instead of writing x+x+x+x, just write 4*x to begin with),
this becomes an important issue with generic code and metaprogramming.
The generic version of the code may very well be w+x+y+z, which cannot
be reduced to n*x, so when you instantiate that code for the case where
w==x==y==z, you have to pay the penalty of genericity. But we can
eliminate this cost if we can tell the compiler that when w==x==y==z,
then w+x+y+z == 4*x. Then we don't have to separately implement this
special case in order to achieve optimal performance, but we will be
able to continue using the generic, more maintainable code.

Something like this will require much more development of Walter's core
concept than currently proposed, of course, but the current proposal is
an important step in this direction, and I fully support it.

T

-- 
There are 10 kinds of people in the world: those who can count in binary, and those who can't.