Program logic bugs vs input/environmental errors

H. S. Teoh via Digitalmars-d digitalmars-d at puremagic.com
Fri Oct 31 14:31:20 PDT 2014


On Fri, Oct 31, 2014 at 09:11:53PM +0000, Kagamin via Digitalmars-d wrote:
> On Friday, 31 October 2014 at 20:33:54 UTC, H. S. Teoh via Digitalmars-d
> wrote:
> >You are misrepresenting Walter's position. His whole point was that
> >once a single component has detected a consistency problem within
> >itself, it can no longer be trusted to continue operating and
> >therefore must be shutdown. That, in turn, leads to the conclusion
> >that your system design must include multiple, redundant, independent
> >modules that perform that one function. *That* is the real answer to
> >system reliability.
> 
> In server software such component is a transaction/request. They are
> independent.

You're using a different definition of "component". An inconsistency in
a transaction is a problem with the input, not a problem with the
program logic itself. If something is wrong with the input, the program
can detect it and recover by aborting the transaction (rollback the
wrong data). But if something is wrong with the program logic itself
(e.g., it committed the transaction instead of rolling back when it
detected a problem) there is no way to recover within the program
itself.


> >Pretending that a failed component can somehow fix itself is a
> >fantasy.
> 
> Traditionally a failed transaction is indeed rolled back. It's more a
> business logic requirement because a partially completed operation
> would confuse the user.

Again, you're using a different definition of "component".

A failed transaction is a problem with the data -- this is recoverable
to some extent (that's why we have the ACID requirement of databases,
for example). For this purpose, you vet the data before trusting that it
is correct. If the data verification fails, you reject the request. This
is why you should never use assert to verify data -- assert is for
checking the program's own consistency, not for checking the validity of
data that came from outside.

A failed component, OTOH, is a problem with program logic. You cannot
recover from that within the program itself, since its own logic has
been compromised. You *can* rollback the wrong changes made to data by
that malfunctioning program, of course, but the rollback must be done by
a decoupled entity outside of that program. Otherwise you might end up
causing even more problems (for example, due to the compromised /
malfunctioning logic, the program commits the data instead of reverting
it, thus turning an intermittent problem into a permanent one).


T

-- 
By understanding a machine-oriented language, the programmer will tend to use a much more efficient method; it is much closer to reality. -- D. Knuth


More information about the Digitalmars-d mailing list