Program logic bugs vs input/environmental errors

Fri Oct 31 15:49:21 PDT 2014

On 10/31/2014 2:31 PM, H. S. Teoh via Digitalmars-d wrote:
> On Fri, Oct 31, 2014 at 09:11:53PM +0000, Kagamin via Digitalmars-d wrote:
>> On Friday, 31 October 2014 at 20:33:54 UTC, H. S. Teoh via Digitalmars-d
>> wrote:
>>> You are misrepresenting Walter's position. His whole point was that
>>> once a single component has detected a consistency problem within
>>> itself, it can no longer be trusted to continue operating and
>>> therefore must be shutdown. That, in turn, leads to the conclusion
>>> that your system design must include multiple, redundant, independent
>>> modules that perform that one function. *That* is the real answer to
>>> system reliability.
>>
>> In server software such component is a transaction/request. They are
>> independent.
>
> You're using a different definition of "component". An inconsistency in
> a transaction is a problem with the input, not a problem with the
> program logic itself. If something is wrong with the input, the program
> can detect it and recover by aborting the transaction (rollback the
> wrong data). But if something is wrong with the program logic itself
> (e.g., it committed the transaction instead of rolling back when it
> detected a problem) there is no way to recover within the program
> itself.
>
>
>>> Pretending that a failed component can somehow fix itself is a
>>> fantasy.
>>
>> Traditionally a failed transaction is indeed rolled back. It's more a
>> business logic requirement because a partially completed operation
>> would confuse the user.
>
> Again, you're using a different definition of "component".
>
> A failed transaction is a problem with the data -- this is recoverable
> to some extent (that's why we have the ACID requirement of databases,
> for example). For this purpose, you vet the data before trusting that it
> is correct. If the data verification fails, you reject the request. This
> is why you should never use assert to verify data -- assert is for
> checking the program's own consistency, not for checking the validity of
> data that came from outside.
>
> A failed component, OTOH, is a problem with program logic. You cannot
> recover from that within the program itself, since its own logic has
> been compromised. You *can* rollback the wrong changes made to data by
> that malfunctioning program, of course, but the rollback must be done by
> a decoupled entity outside of that program. Otherwise you might end up
> causing even more problems (for example, due to the compromised /
> malfunctioning logic, the program commits the data instead of reverting
> it, thus turning an intermittent problem into a permanent one).

This is a good summation of the situation.