Bottom Type--Type Theory

Thu Jan 17 22:13:13 UTC 2019

On Thu, Jan 17, 2019 at 09:48:52PM +0100, Johannes Loher via Digitalmars-d wrote:
[...]
> I don't actually think declaring a variable of the Bottom type is
> required to be illegal. It just needs to be impossible for it to ever
> be initialized. This can trivially be done by only allwoing it to be
> initialized by copying because whenever you have an expression of type
> Bottom to which you "initialize" a variable of type Bottom, that
> expression never returns and thus the initialization never actually
> takes place.

That's an interesting take on Bottom. I think that makes sense.  In D,
we have default initialization, so a declaration like:

	Bottom b;

implicitly means (in pseudocode):

	Bottom b = void; // (yikes, another meaning of `void`)
	b = b.init;

Since b.init does not exist, this would result in a compile error.

However, if you initialized it with an expression of type Bottom, then
it would "work":

	Bottom func() {...}
	Bottom b = func();

This is OK, because func() never returns (by definition), so the
assignment to b never takes place.  Which also means the compiler can
simply elide b from any stack-allocation code and skip emitting code for
the assignment, etc., because it is unreachable code.  All subsequent
code would also be unreachable and can be elided.

> This would still allow things like
> ```
> Bottom var = assert(0);
> ```

Yes.  Though *why* you'd want to declare a variable of type Bottom -- as
opposed to just writing the expression of type Bottom -- is a good
question that I don't have a good answer to.

> and also
> ```
> void fun(Bottom b)
> {
>     // whatever, this function can never actually be called
> }
[...]

This is an interesting one.  IOW, the parameter Bottom b is basically
equivalent to "this function is never reached".  So it can be completely
elided from codegen?  This would let you write strange things like:

	fun(assert(0));

though it does make one wonder why write `fun` at all instead of just
`assert(0)` directly.

But on second thoughts... something like this could be extremely useful
in generic code.  For example, you could have a generic function that
invokes a user callback and returns the value returned by the callback.
You could then pass in a function that returns Bottom (i.e., it never
terminates), then the generic function will automatically be inferred to
be of type Bottom, and the compiler could stop codegen as soon as it
encounters the first expression of type Bottom -- we get non-termination
analysis for free, as a natural consequence of the type system!

In fact, declaring a variable of type Bottom then becomes a synonym for
writing assert(0).  Generic code can then freely operate on any input
type T, and if T == Bottom, then the type system automatically performs
reachability analysis for us, and the codegen will only need to go as
far as the first occurrence of Bottom.  It can then just emit hlt and
exit codegen for that function.  It wouldn't matter what expression
Bottom was found in: it could be nested somewhere inside a complex
expression, and the type system automatically figures out that the type
is Bottom and therefore codegen can just stop emitting code at that
point.  There would be no need to special-case checking for assert(0) to
elide unreachable code, and no need for the template code to use static
if to check for Bottom.  It just falls out of the type system naturally.

Even better yet, this means we don't even need to make it an error to
declare and initialize a variable of type Bottom: the initialization can
just be defined to be `assert(0);`, i.e.:

	Bottom b;

becomes equivalent to (in pseudo-code):

	Bottom b = void;
	b = assert(0);

which simplifies to just:

	assert(0);

The unreachability then just naturally falls out of the type system. The
compiler can compile the function as normal, as though we were actually
initializing a variable of type Bottom, then during codegen the
optimizer sees the `assert(0);` and elides everything else that follows
it.  At runtime, this would just abort at the point where b is
initialized.

This also lets us write sanity-checking expressions such as:

	float sqrt(float x) {
		return (x >= 0.0) ? ... /* implementation here */
			: Bottom.init;
	}

This is valid since Bottom converts to any type. Then at runtime, if x <
0.0, assert(0) gets triggered.  There would be no need to write an
if-statement that explicitly calls assert(0).

Nice, well-rounded-out semantics.

T

-- 
Those who don't understand Unix are condemned to reinvent it, poorly.