Everyone who writes safety critical software should read this

Wed Nov 6 01:35:58 PST 2013

On Wednesday, 6 November 2013 at 01:52:30 UTC, growler wrote:
> On Tuesday, 5 November 2013 at 08:41:17 UTC, eles wrote:
>> On Saturday, 2 November 2013 at 04:03:46 UTC, Walter Bright 
>> wrote:
>>> On 11/1/2013 8:03 AM, bearophile wrote:
> Fail safe design needs to be engineered to handle the situation 
> when any component fails regardless of the quality of 
> components used. Software is just one more (weak) component in 
> the system.

Yes, but you cannot go at zero probability, only if you use an 
infinite number of back-ups. Otherwise, there is some 
infinitesimal, but non-zero probability that everything fails.

You take two teams that develop software independently, in 
different languages, on different machine architectures etc. 
However, there is a non-zero probability that both teams (or 
compilers or processor or all of that) expose the same bug or the 
arbiter that counts the votes has some error.

In designing failsafe systems *you rely* on something, because 
you have no choice. But yes, you go as pessimistic as possible 
(usually, limited by the budget).

Hardware can fail mostly for the same reasons that software fails 
too. The difference, in the long term, is that once a software is 
100% correct, it will never get worse. The hardware can be in 
good shape today and badly broken tomorrow. Just have a look at 
Curiosity's digger.

> Of course component quality is important to overall safety 
> because fail safe systems are not foolproof. But as Walter says 
> it should not be part of the solution nor relied upon in a fail 
> safe deign.

As said earlier, you cannot go as extreme as that. You don't rely 
on any specific part, but you rely on combination of parts and 
you simply bet on the fact that their probability of independent 
but simultaneous failure is very small.

Then, it is a matter of scale what means "a part" and "several 
parts". Just zoom in and out on the project's design and you see 
it. Is more like a fractal.

If you don't allow yourself to rely on anything, you get nothing 
built. You may design perfect fail safe systems, you just cannot 
build those.

The bottom line is: never claim that your system is fully fail 
safe, no matter the strategy and the care you put in designing 
and building it. There is no spoon.