[theory] What is a type?

Mon Jan 15 19:25:16 UTC 2018

On Mon, Jan 15, 2018 at 06:58:37PM +0000, Jonathan Marler via Digitalmars-d wrote:
[...]
> That being said, I don't think this model really get much closer to
> answering the original question, "what is a type?".
[...]

At the most abstract level, a type is just a set. The members of the set
represent what values that type can have. Operations on types are
therefore simply operations on sets, which lets us bring in the entire
mechanism of mathematical sets to elaborate upon the theory of types.

A boolean, for example, is simply a 2-membered set B = {true, false}. A
struct of two booleans is just the Cartesian product B x B. An array of
booleans is simply the Kleene closure of B, i.e., the union of all
Cartesian powers of B, usually denoted as B*.

This abstract view disregards how a type is actually implemented, and
merely considers its set of possible values.  Under this model, you
begin by describing the sets of possible values your program will work
with. Then when you're ready to implement it, you decide on some kind of
mapping to map the abstract sets to the physical types provided by the
hardware.

On a more practical level, the way I see a type is an *interpretation*
imposed upon a series of zero or more bytes (or, if you want to drill
down even deeper, bits).  The same binary values can represent different
types, depending on how you interpret them.  Under this model, you
proceed in the opposite direction: starting with an arbitrary sequence
of bytes, you recursively build up interpretations of it until you
arrive at the data structures that your program will operate on.

The abstract view is useful when you're working within your problem
domain: it allows you to phrase your algorithms and transformations
directly in terms of the objects and operations you wish to work on. But
it's not always so easy to map this abstract model onto the quirks of
the hardware you have to work with. It's easy to think of trivial
mappings, of course, but generally they will have pretty poor
performance because they don't take advantage of the properties of the
underlying hardware.  Finding an *efficient* mapping that doesn't also
break correctness can be quite challenging.

The practical view is useful when you're building things from ground up,
e.g., when you're building a compiler. It lets you build up types that
can take advantage of the underlying hardware's properties, and also
avoid or work around hardware quirks or weaknesses.  It also allows you
to reinterpret types creatively, that may allow you to achieve
interesting effects (e.g., the data bytes output by the compiler can be
reinterpreted as executable code by the CPU). However, types built up
this way may not map nicely into your problem domain, and that forces
your code to become convoluted in order to compensate for the impedance
mismatch between the two. Correctness may also be compromised if the
objects in your problem domain don't have a 1-to-1 fit with the hardware
types (e.g., your calculator program might operate on integers in the
mathematical sense of numbers that can be arbitrarily large, but the int
type provided by the hardware is modulo some power of 2; so when values
get too large, what your program thinks of as an integer no longer
matches what the hardware implements, resulting in overflow/underflow
bugs).

Finding the sweet spot that balances the two is non-trivial, and is the
subject of entire fields of research. :-D

T

-- 
The trouble with TCP jokes is that it's like hearing the same joke over and over.