Making alloca more safe

Mon Nov 16 12:48:51 PST 2009

bearophile wrote:
> Walter Bright:
>> I just wished to point out that it was not a *safety* issue.<
> A safe system is not a program that switches itself off as soon as
> there's a small problem.

Computers cannot know whether a problem is "small" or not.

> One Ariane missile has self-destroyed (and destroyed an extremely
> important scientific satellite it was carrying whose mission I miss
> still) because of this silly behaviour united with the inflexibility
> of the Ada language.
> 
> A reliable system is a systems that keeps working correctly despite
> all. If this is not possible, in real life you usually want a "good
> enough" behaviour. For example, for your TAC medical machine, in
> Africa if the machine switches itself off at the minimal problem they
> force the machine to start again, because they don't have money for a
> 100% perfect fix. So for them it's better a machine that shows a slow
> and graceful degradation. That's a reliable system, something that
> looks more like your liver, that doesn't totally switch off as soon
> it has a small problem (killing you quickly).

This is how you make reliable systems:

http://dobbscodetalk.com/index.php?option=com_myblog&show=Safe-Systems-from-Unreliable-Parts.html&Itemid=29

http://dobbscodetalk.com/index.php?option=com_myblog&show=Designing-Safe-Software-Systems-Part-2.html&Itemid=29

Pretending a program hasn't failed when it has, and just "soldiering 
on", is completely unacceptable behavior in a system that must be reliable.

The Ariane 5 had a backup system which was engaged, but the backup 
system had the same software in it, so failed in the same way. That is 
not how you make reliable systems.

> A program that stops working in a random moment because of a null is
> not safe. (And even if you accept this, in safer languages like
> C#/Java there are null exceptions that show a stack trace. The type
> system is smart enough to remove most of those tests to improve
> performance). A safer program is a program that avoids null pointer
> exception because the type system has formally verified the program
> has no nulls.

You're using two different definitions of the word "safe". Program 
safety is about not corrupting memory. System safety (i.e. reliability) 
is a completely different thing.

If you've got a system that relies on the software continuing to 
function after an unexpected null seg fault, you have a VERY BADLY 
DESIGNED and COMPLETELY UNSAFE system. I really cannot emphasize this 
enough.

P.S. I worked for Boeing for years on flight critical systems. Normally 
I eschew credentialism, but I feel very strongly about this issue and 
wish to point out that my knowledge on this is based on decades of real 
world experience by aviation companies who take this issue extremely 
seriously.