[dmd-concurrency] Thread termination protocol (shutdown protocol evolved)

Thu Jan 21 10:11:17 PST 2010

Here is another idea for the "shutdown protocol". I'm changing the name to better reflect what the proposal is. Also take note that I've renamed the "Shutdown" exception to "Terminated".

It includes ideas from my previous proposal as well as from how Erlang handles linked processes. Linked processes in Erlang define an error handling mechanism, much like the one I'm proposing here. I was mistaken before about how it worked and what it did. This time I've integrated the concept correctly.

Thank you for reading! This might take a while. :-)

 - - -

The thread termination protocol has two goals:

* Establish a generic way of expressing when you want a thread to terminate that can cover a majority of cases. But it's important that cases not supported by it can still be handled by user-constructed termination protocols.

* Establish a generic way to handle thrown exceptions in spawned threads.

So the thread termination protocol relies on four important points:

1. When spawning a thread, the parent thread is set as the owner of the new one.

2. The owner link with the child thread can be broken by choosing another thread as the owner. Setting the owner to the main thread means that you don't want the child to be terminated until the program itself terminates.

3. When a thread terminates, it sends a Terminated exception message to each of the threads it owns.

4. When a child thread receives a Terminated exception message, the thread can handle it and even ignore it if it wants. But in the absence of corresponding message handlers and exception handlers, the thrown exception will stop the thread.

5. When a thread terminates via an exception other than Terminated, the exception is sent back as a message to the owner thread. In the absence of corresponding message handlers and exception handlers, the thrown exception will stop the owner thread and thus again send the exception as a message to the owner's owner, until it reaches the main thread (which has no owner).

Here is an important thing: sending a Terminated exception must not prevent the thread from receiving more messages afterwards. If the child thread chooses to ignore the Terminated message then nothing prevents it to continue receiving messages normally afterward. One reason for this is that it might want to postpone termination to perform a closing handshake with something else it is currently communicating with.

Also important is that you can at any time send manually a Terminated message to a thread when you want it to terminate.

And we might want to add a Tid field to the Exception class to identify the thread it originated from.

 - - -

Now, let's see how it works with various use cases. (This first case one is pretty much a repetition of the one that came along with my previous shutdown protocol proposal.)

For the file copy example with an intermediate processing step, it's a simple ownership graph:

	main -owns- read thread -owns- processing thread -owns- writer thread

When main terminates, it sends Terminated to the read thread, which ignores it because it's reading from a file. When the read threads finish reading, it terminates and send a Terminated to the processing thread which will receive it as its last message. When the processing thread receives Terminated it terminates which automatically sends a Terminated message to the writer thread. The writer threads then terminates after writing the last part. At this moment the program closes.

What happens if the writer thread throws an exception (other than Terminated)? The exception will terminate the writer thread, be sent back as a message to the processing thread, which will terminate and send the exception to the reader thread, which will terminate and send the exception to the main thread, which will terminate the program. If any of those threads in the middle of the chain is already terminated when the exception is thrown, the exception is sent directly to the owner's owner.

Of course, any thread in the graph might catch the exception, preventing it from percolating to other threads.

So this simple case works well out of the box. That's because the graph is a simple tree. If you have a thread spawning a child thread only to then give it to another thread, then you'll probably want to decide yourself when you want to terminate it and who should handle exceptions. Here is how that should work:

1. Create your thread, setting ownership to the main thread.
2. Give the Tid to whoever you want.
3. ...
4. Send the thread a Terminated exception when you're done with it.

Here the owner thread just acts as a safeguard in case you forget to send a Terminated message manually. You can set the owner to any thread that lives longer than the spawned thread, not necessarily the main thread. When you know you want to terminate the thread, just send it a Terminated exception.

You might want to setup a special "monitoring" thread as the owner of such child threads. This thread could catch exceptions leaking from child threads and do some error handling.

 - - -

For the API, I propose this:

	spawn(function, args...)
	// creates a new thread having the spawning thread as the owner.

	spawnOwned(ownerTid, function, args...)
	// creates a new thread with a specific owner.

	tid.owner = ownerTid
	// Changes the owner of a thread.
	// Note 1: this needs to be protected against circular ownerships.

	terminate(tid);
	// Sends a Terminated exception to the thread. This only works for
	// threads listening for messages.

This makes only two notable differences with Erlang:

1. You cannot have unlinked threads. This ensures that all threads receive a Terminated message eventually (if they don't terminate by themselves before that). This also make sure that uncaught exceptions will always be propagated back to somewhere, right up to the main thread if you don't catch them.

2. Sending a Terminated exception is a standard way to tell a thread to just stop. I don't think there is such a thing in Erlang. Fortunately, you don't have to obey the Terminated message if you don't want to, but most likely you'll just want to postpone termination while you clean things up.

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/