Program logic bugs vs input/environmental errors

Fri Oct 31 13:15:17 PDT 2014

On Thursday, 16 October 2014 at 19:53:42 UTC, Walter Bright wrote:
> On 10/15/2014 12:19 AM, Kagamin wrote:
>> Sure, software is one part of an airplane, like a thread is a 
>> part of a process.
>> When the part fails, you discard it and continue operation. In 
>> software it works
>> by rolling back a failed transaction. An airplane has some 
>> tricks to recover
>> from failures, but still it's a "no fail" design you argue 
>> against: it shuts
>> down parts one by one when and only when they fail and 
>> continues operation no
>> matter what until nothing works and even then it still doesn't 
>> fail, just does
>> nothing. The airplane example works against your arguments.
>
> This is a serious misunderstanding of what I'm talking about.
>
> Again, on an airplane, no way in hell is a software system 
> going to be allowed to continue operating after it has 
> self-detected a bug. Trying to bend the imprecise language I 
> use into meaning the opposite doesn't change that.

To better depict the big picture as I see it:

You suggest that a system should shutdown as soon as possible on 
first sign of failure, which can affect the system.

You provide the hospital in a hurricane example. But you don't 
praise the hospitals, which shutdown on failure, you praise the 
hospital, which continues to operate in face of an unexpected and 
uncontrollable disaster in total contradiction with your 
suggestion to shutdown ASAP.

You refer to airplane's ability to not shutdown ASAP and continue 
operation on unexpected failure as if it corresponds to your 
suggestion to shutdown ASAP. This makes no sense, you contradict 
yourself.

Why didn't you praise hospital shutdown? Why nobody wants 
airplanes to dive into ocean on first suspicion? Because that's 
how unreliable systems work: they often stop working. And 
reliable systems work in a completely different way, they employ 
many tricks, but one big objective of these tricks is to have 
ability to continue operation on failure. All the effort put into 
airplane design with one reason: to fight against immediate 
shutdown, defended by you as the only true way of operation. 
Exactly the way explicitly rejected by real reliable systems 
design. How an airplane without the tricks would work? It would 
dive into ocean on first failure (and crash investigation team 
diagnoses the failure) - exactly as you suggest. That's safe: it 
could fall on a city or a nuclear reactor. How a real airplane 
works? Failure happens and it still flies, contrary to your 
suggestion to shutdown on failure. That's how critical missions 
are done: they take a risk of a greater disaster to complete the 
mission, and failures can be diagnosed when appropriate.

That's why I think your examples contradict to your proposal.