The extent of trust in errors and error handling
Profile Anaysis via Digitalmars-d
digitalmars-d at puremagic.com
Sun Feb 5 00:55:40 PST 2017
On Wednesday, 1 February 2017 at 19:25:07 UTC, Ali Çehreli wrote:
> tl;dr - Seeking thoughts on trusting a system that allows
> "handling" errors.
>
> One of my extra-curricular interests is the Mill CPU[1]. A
> recent discussion in that context reminded me of the
> Error-Exception distinction in languages like D.
>
> 1) There is the well-known issue of whether Error should ever
> be caught. If Error represents conditions where the application
> is not in a defined state, hence it should stop operating as
> soon as possible, should that also carry over to other
> applications, to the OS, and perhaps even to other systems in
> the whole cluster?
>
No, because your logic would then extend to all of the human
race, to animals, etc. It is not practical and not necessary.
1. The ball must keep rolling. All of this stuff we do is fantasy
anyways so if an error occurs in that lemmings game, it is just a
game. It might take down every computer in the universe(if we
went with the logic above) but it can't affect humans because
they are distinct from computers(it might kill a few humans but
that has always been acceptable to humans).
That is, it is not practical to take everything down because an
error is not that serious and ultimately has limited affect.
That is, in the practical world, we are ok with some errors. This
allows us not to worry to much. The more we would have to worry
about such errors the more things would have to be shut down
exactly because of the logic you have given. So, it is not a
problem if "should we do x or not x" but how much of x is
acceptable.
(The human race has decided that quite a bit of errors are ok. We
can even have errors such as a medical device malfunctioning
because some error like invalid array access kill people and it's
ok(it's just money, and lawyers will be happy))
2. Not all errors will systematically propagate in to all other
systems. e.g., two computers not connected to in any way. If one
has an error, the other won't be affected so no reason to take
that computer down too.
So, what matters, like anything else, is that we try to do the
best we can. We don't have to pick an arbitrary point of when to
stop because we actually don't know. What we do is use reason and
experience to decide what is the most likely solution and see how
much risk that has. If it has too much we back off, if not enough
we back off.
There is an optimal point, more or less, because risk requires
energy to manage(even for no risk).
Basically if you assume, like you seem to be doing, that a
singular error creates an unstable state in the whole system at
every point, then you are screwed from the get go if you do not
any any unstable state at any cost. The only solution is to not
have any errors at any point then. (which requires perfection,
something humans gave up on trying to achieve a long time ago)
3. Things are not so cut and dry. Intelligence can be used to
understand the problem. Not all errors are the simple. Some
errors are catastrophic and need everything shut down and some
don't. Knowing those error types is important. Hence, the more
descriptive something is the better as it allows one create
separation. Also, designing things to be robust is another way to
mitigate the problems.
Programming is not much different than banking. You have a
certain amount of risk in a certain portfolio(program), you hedge
your bets(create a good robust design), and hope for the best.
It's up to the individual to decide how much the hedging is
required as it will require time/money to do it.
Example: Windows. Obviously windows was a design that didn't care
too much about robustness. Just enough to get the job done was
their motto. If someone dies because of some BSOD, it's not that
big a deal... it will be hard to trace the cause, and if it can
be done they have enough money to afford it. (similar to the ford
fiasco
https://en.wikibooks.org/wiki/Professionalism/The_Ford_Pinto_Gas_Tank_Controversy)
More information about the Digitalmars-d
mailing list