Everyone who writes safety critical software should read this

Wed Oct 30 12:24:24 PDT 2013

On Tue, Oct 29, 2013 at 07:14:50PM -0700, Walter Bright wrote:
[...]
> The ideas are actually pretty simple. The hard parts are:
> 
> 1. Convincing engineers that this is the right way to do it.

Yeah, if you had said this to me many years ago, I'd have rejected it.
Sadly, it's only with hard experience that one comes to acknowledge
wisdom.

> 2. Convincing people that improving quality, better testing, hiring
> better engineers, government licensing for engineers, following
> MISRA standards, etc., are not the solution. (Note that all of the
> above were proposed in the HN thread.)

Ha. And yet where do we see companies pouring all that money into?
Precisely into improving quality, improving test coverage, inventing
better screening for hiring engineers, and in many places, requiring
pieces of paper to certify that candidate X has successfully completed
program Y sponsored by large corporation Z, which purportedly has a good
reputation that therefore (by some inscrutible leap of logic) translates
to proof that candidate X is capable of producing better code, which
therefore equates to the product being made ... safer? Hmm. Something
about the above line of reasoning seems to be fishy somewhere. :P

(And don't even get me started on the corporate obsession with standards
bearing acronymic buzzword names that purportedly will solve everything
from software bugs to world hunger. As though the act of writing the
acronym into the company recommended practices handbook [which we all
know everybody loves to read and obey, to the letter] will actually
change anything.)

> 3. Beating out of engineers the hubris that "this part I designed
> will never fail!" Jeepers, how often I've heard that.

"This piece of code is so trivial, and so obviously, blatantly correct,
that it serves as its own proof of correctness." (Later...) "What do you
*mean* the unit tests are failing?!"

> 4. Developing a mindset of "what happens when this part fails in the
> worst way."

I wish software companies would adopt this mentality. It would save so
many headaches I get just from *using* software as an end-user (don't
even mention what I have to do at work as a software developer).

> 5. Learning to recognize inadvertent coupling between the primary
> and backup systems.

If there even *is* a backup system... :P  I think a frighteningly high
percentage of enterprise software fails this criterion.

> 6. Being familiar with the case histories of failure of related
> designs.

They really should put this into the CS curriculum.

> 7. Developing a system to track failures, the resolutions, and check
> that new designs don't suffer from the same problems. (Much like D's
> bugzilla, the test suite, and the auto-tester.)

I like how the test suite actually (mostly?) consists of failing cases
from actual reported bugs, which the autotester then tests for, thus
ensuring that the same bugs don't happen again.

Most software companies have bug trackers, I'm pretty sure, but it's
pretty scary how few of them actually have an *automated* system in
place to ensure that previously-fixed bugs don't recur. Some places rely
on the QA department doing manual testing over some standard checklist
that may have no resemblance whatsoever to previously-fixed bugs, as
though it's "good enough" that the next patch release (which is
inevitably not just a "patch" but a full-on new version packed with new,
poorly-tested features) doesn't outright crash on the most basic
functionality. Use anything more complex than trivial, everyday tasks?
With any luck, you'll crash within the first 5 minutes of using the new
version just by previously-fixed bugs that got broken again. Which then
leads to today's mentality of "let's *not* upgrade until everybody else
has crashed the system to bits and the developers have been shamed into
fixing them, then maybe things won't break as badly when we do upgrade".

For automated testing to be practical, of course, requires that the
system be designed to be tested in that way in the first place -- which
unfortunately very few programmers have been trained to do. "Whaddya
mean, make my code modular and independently testable? I've a deadline
by 12am tonight, and I don't have time for that! Just hardcode the data
into the global variables and get the product out the door before the
midnight bell strikes; who cares if this thing is testable, as long as
the customer thinks it looks like it works!"

Sigh.

T

-- 
Береги платье снову, а здоровье смолоду.