Null references (oh no, not again!)

Wed Mar 4 04:13:13 PST 2009

On Wed, 04 Mar 2009 14:40:58 +0300, Walter Bright <newshound1 at digitalmars.com> wrote:

> Denis Koroskin wrote:
>> On Wed, 04 Mar 2009 13:55:57 +0300, Walter Bright
>>> If software is in your flight critical systems, the way one proceeds  
>>> is to *assume skynet takes it over* and will attempt to do everything  
>>> possible to crash the airplane.
>>  Assume you got a null-derefence under Linux. How are you going to  
>> recover from it? You can't catch the NullPointerException, so your  
>> program will fail and bring down the whole system *anyway*.
>
> You design your critical system so it is not vulnerable to the failure  
> of a subsystem of it, even if that subsystem is powered by linux.
>
> For example, you might have two computer systems controlling the  
> process. They vote, and if they disagree, they both are removed and the  
> backup is engaged. The two systems use different operating systems - say  
> one linux the other windows, they use different software written with  
> different algorithms in different languages.
>
> The space shuttle, for example, had 4 independent flight control  
> computers voting, and a 5th (with reduced capability) that could be  
> manually brought online in case the 4 primaries all failed.
>
> Google did an interesting design for their Chrome browser. Each tab in  
> it was powered by a separate process, meaning the hardware isolated it  
> from the operation of the other tabs. So if the browser crashed in one  
> tab, it wouldn't affect the other ones.
>
> I've read elsewhere that if you want to create a robust system, you  
> break it up into different modules and run those modules as separate  
> processes (not just separate threads) that communicate via interprocess  
> communication. Any particular module dying could then be restarted  
> without affecting the rest of the modules.
>
> The wrong way to do it is to lump everything into one gigantic process.  
> Then, any failure brings everything down.

Most people can't afford their applications run on a few computers just in case one of them fails. Besides, as you yourself pointed out, NPE are often repeatable, so if you re-run the task on another PC, chances are it will fail, too.

No doubt, Google Chrome is a beautiful piece of software. It doesn't crash the whole browser when something is null-dereferenced. But the message I've been writing for half an hour is *lost* anyway when the host process fails.

The way you suggest writing software is like a doctor who suggests curing/hiding symptoms rather than the cause of an illness. You shouldn't rely on exception recovery when you may avoid the whole class of bugs altogether.