Null references (oh no, not again!)

Wed Mar 4 03:40:58 PST 2009

Denis Koroskin wrote:
> On Wed, 04 Mar 2009 13:55:57 +0300, Walter Bright 
>> If software is in your flight critical systems, the way one proceeds 
>> is to *assume skynet takes it over* and will attempt to do everything 
>> possible to crash the airplane.
> 
> Assume you got a null-derefence under Linux. How are you going to 
> recover from it? You can't catch the NullPointerException, so your 
> program will fail and bring down the whole system *anyway*.

You design your critical system so it is not vulnerable to the failure 
of a subsystem of it, even if that subsystem is powered by linux.

For example, you might have two computer systems controlling the 
process. They vote, and if they disagree, they both are removed and the 
backup is engaged. The two systems use different operating systems - say 
one linux the other windows, they use different software written with 
different algorithms in different languages.

The space shuttle, for example, had 4 independent flight control 
computers voting, and a 5th (with reduced capability) that could be 
manually brought online in case the 4 primaries all failed.

Google did an interesting design for their Chrome browser. Each tab in 
it was powered by a separate process, meaning the hardware isolated it 
from the operation of the other tabs. So if the browser crashed in one 
tab, it wouldn't affect the other ones.

I've read elsewhere that if you want to create a robust system, you 
break it up into different modules and run those modules as separate 
processes (not just separate threads) that communicate via interprocess 
communication. Any particular module dying could then be restarted 
without affecting the rest of the modules.

The wrong way to do it is to lump everything into one gigantic process. 
Then, any failure brings everything down.