Program logic bugs vs input/environmental errors

Sat Oct 4 04:19:00 PDT 2014

On 04/10/14 11:18, Walter Bright via Digitalmars-d wrote:
> What you're doing is attempting to write a program with the requirement that the
> program cannot fail.
>
> It's impossible.

No, I'm attempting to discuss how to approach the problem that the program _can_ 
fail, and how to isolate that failure appropriately.

I'm asking for discussion of how to handle a use-case, not trying to advocate 
for particular solutions.

You seem to be convinced that I don't understand the principles you are 
advocating of isolation, backup, and so forth.  What I've been trying (but 
obviously failing) to communicate to you is, "OK, I agree on these principles, 
let's talk about how to achieve them in a practical sense with D."

> If that's your requirement, the system needs to be redesigned so that it can
> accommodate the failure of the program.
>
> (Ignoring bugs in the program is not accommodating failure, it's pretending that
> the program cannot fail.)

Indeed.

>> As I'm sure you realize, I also picked that particular use-case because it's one
>> where there is a well-known technological solution -- Erlang -- which has as a
>> key feature its ability to isolate different parts of the program, and to deal
>> with errors by bringing down the local process where the error occurred, rather
>> than the whole system.  This is an approach which is seriously battle-tested in
>> production.
>
> As I (and Brad) has stated before, process isolation, shutting down the failed
> process, and restarting the process, is acceptable, because processes are
> isolated from each other.
>
> Threads are not isolated from each other. They are not. Not. Not.

I will repeat what I said in my previous email: "Without assuming anything about 
how the system is architected".

I realize that in my earlier remark:

> However, it's clearly very desirable in this use-case for the application to keep going if at all possible and for any problem, even an Error, to be contained in its local context if we can do so.  (By "local context", in practice this probably means a thread or fiber or some other similar programming construct.)

... I probably conveyed the idea that I was seeking to contain Errors inside 
threads or fibers.  I was already anticipating that the answer here would be a 
definitive "You can't under any circumstances", and hence why I wrote, "or other 
similar programming construct", by which I was thinking of Erlang-style processes.

Actually, a large part of my reason for continuing this discussion is because 
where high-connectivity server applications are concerned, I'm keen to ensure 
that their developers _avoid_ the dangerous solution that is, "Spawn lots of 
threads and fibers, and localize Errors by catching them and throwing away the 
thread rather than the application."

However, unless there is an alternative in a practical sense, that is probably 
what people are going to do, because the trade-offs of their use-case make it 
seem the least bad option.  I think that's a crying shame and that we can and 
should do better.

> The only way to have super high uptime is to design the system so that failure
> is isolated, and the failed process can be quickly restarted or replaced.
> Ignoring bugs is not isolation, and hoping that bugs in one thread doesn't
> affected memory shared by other threads doesn't work.

Right.  Which is why I'd like to move the discussion over to "How can we achieve 
this in D?"