Sutter's ISO C++ Trip Report - The best compliment is when someone else steals your ideas....
Ola Fosheim Grøstad
ola.fosheim.grostad at gmail.com
Sun Jul 15 10:45:23 UTC 2018
On Friday, 13 July 2018 at 12:55:33 UTC, Adam D. Ruppe wrote:
> You use process isolation so it is easy to restart part of it
> without disrupting others. Then it can crash without bringing
> the system down. This is doable with segfaults and range
> errors, same as with exceptions.
>
> This is one of the most important systems engineering
> principles: expect failure from any part, but keep the system
> as a whole running anyway.
If we are talking about something application-specific and in
probabilistic terms, then yes certainly.
But that is not the absolutist position where any failure should
lead to a shutdown (and consequently a ban on reboot as the
failed assert might happen hours after the actual buggy code
executed).
The absolutist position would also have to assume that all
communicated state is corrupted so a separate process does not
improve the situation. Since you don't know with a 100% certainty
what the bug consists of you should not retain any state from any
source after the _earliest_ time where the buggy logic in theory
could have been involved. All databases should be assumed
corrupted, no messages should be accepted etc (messages and
databases are no different from memory in this regard).
In reality absolutist positions are usually not possible to
uphold so you have to move to a probabilistic position. And the
compiler cannot make probabilistic assumptions, you need a lot of
contextual understanding to make those probabilistic assessment
(e.g. the architect or programmer has to be involved).
Fully reactive systems does not retain state of course, and those
would change the argument somewhat, but they are very rare...
mostly limited to control systems (cars, airplanes etc).
The idea behind actor-based programming (e.g. Erlang) isn't that
bugs don't occur or that the overall system will exhibit correct
behaviour, but that it should be able to correct or adapt to
situations despite bugs being present. But that is really,
predominantly, not available to us with the very "crisp" logic we
use in current languages (true/false, all or nothing). Maybe
something better will come out of probabilistic programming
paradigms and software synthesis some time in the future. Within
the current paradigms we are stuck with the judgment of the
humans involved.
Interestingly biological systems are much better at robustness,
fault tolerance and self-healing, but that involves a lot of
overhead and also assumes that some failures are acceptable as
long as the overall system can recover from it. Actor-programming
is based on the same assumption, the health of the overall (big)
system.
More information about the Digitalmars-d
mailing list