This thread on Hacker News terrifies me

Sat Sep 1 21:33:39 UTC 2018

On 8/31/2018 3:50 PM, Walter Bright wrote:
> https://news.ycombinator.com/item?id=17880722
> 
> Typical comments:
> 
> "`assertAndContinue` crashes in dev and logs an error and keeps going in
> prod. Each time we want to verify a runtime assumption, we decide which
> type of assert to use. We prefer `assertAndContinue` (and I push for it
> in code review),"
> 
> "Stopping all executing may not be the correct 'safe state' for an
> airplane though!"
> 
> "One faction believed you should never intentionally crash the app"
> 
> "One place I worked had a team that was very adamant about not really
> having much error checking. Not much of any qc process, either. Wait for
> someone to complain about bad data and respond. Honestly, this worked
> really well for small, skunkworks type projects that needed to be nimble."
> 
> And on and on. It's unbelievable. The conventional wisdom in software
> for how to deal with programming bugs simply does not exist.
> 
> Here's the same topic on Reddit with the same awful ideas:
> 
> https://www.reddit.com/r/programming/comments/9bl72d/assertions_in_production_code/
> 
> 
> No wonder that DVD players still hang when you insert a DVD with a
> scratch on it, and I've had a lot of DVD and Bluray players over the
> last 20 years. No wonder that malware is everywhere.

All too true.

A while ago I worked for a large financial company.

Many production systems had zero monitoring. A server with networking
issues could continue to misbehave _for hours_ until someone somewhere
noticed thousands of error messages and manually intervened.

There were also very few data quality checks. Databases could have
duplicate records, missing records or obviously inconsistent
information. Most systems just continued to process corrupt data as if
nothing happened, propagating it further and further.

Some crucial infrastructure had no usable data backups.

With all this in mind, you would be surprised to hear how much they
talked about "software quality". It's just that their notion of quality
revolved around having no bugs ever go into production and never
bringing down any systems. There were ever increasing requirements
around unit test coverage, opinionated coding standards and a lot of
paperwork associated with every change.

Needless to say, it didn't work very well, and they had round half a
dozen outages of varying sizes _every day_.

Alan Kay, Joe Armstrong, Jim Coplien - just to name a few famous people
who talked about this issue. It's amazing that so many engineers still
don't get it. I'm inclined to put some blame on the recent TDD movement.
They often to seem stress low-level code perfectionism, while ignoring
high-level architecture and runtime resilience (in other words, system
thinking).