Developing Mars lander software

Tue Feb 18 16:16:02 PST 2014

On Tuesday, 18 February 2014 at 23:05:21 UTC, Walter Bright wrote:
> http://cacm.acm.org/magazines/2014/2/171689-mars-code/fulltext
>
> Some interesting tidbits:
>
> "We later revised it to require that the flight software as a 
> whole, and each module within it, had to reach a minimal 
> assertion density of 2%. There is compelling evidence that 
> higher assertion densities correlate with lower residual defect 
> densities."
>
> This has been my experience with asserts, too.
>
> "A failing assertion is now tied in with the fault-protection 
> system and by default places the spacecraft into a predefined 
> safe state where the cause of the failure can be diagnosed 
> carefully before normal operation is resumed."
>
> Nice to see confirmation of that.
>
> "Running the same landing software on two CPUs in parallel 
> offers little protection against software defects. Two 
> different versions of the entry-descent-and-landing code were 
> therefore developed, with the version running on the backup CPU 
> a simplified version of the primary version running on the main 
> CPU. In the case where the main CPU would have unexpectedly 
> failed during the landing sequence, the backup CPU was 
> programmed to take control and continue the sequence following 
> the simplified procedure."
>
> An example of using dual systems for reliability.

TL;DR the link though, how are they detecting that a CPU fails? 
An information must be passes outside of CPU to do this. The only 
solution comes to my mind is that main CPU changes a variable on 
an external memory at every step, and back up CPU checks it 
continuously to catch a failure immediately. But this would 
require about 50% of CPU's power already.

While thinking about this kind of back up systems, knowing and 
reading that some people are really doing is really great.