Extended Type Design.

Sat Mar 17 14:50:51 PDT 2007

Bruno Medeiros wrote:
> What is the status of the experimental designs ...

I asked this because yesterday, out of nowhere, I started thinking about 
this problem, and as a kind of a mental exercise I came to a working 
design. It seems kinda pointless, since you already made your design, 
but I'll show this one nevertheless. Consider it a late entry to the max 
challenge :P . Still, there are some aspects presented here, that I 
don't how they would work on your planned design (like those keyed by 
the questions).
Most of the terms here are tentative, and so is the syntax. Please 
consider the syntax separately from the semantics and conceptualization. 
Errors may be present in the text. This design is presented as is, 
without any warranty of any kind. :P

CONCEPTUALIZATION

There are 3 major kinds of D entities:

Values (expressions), types, and templates. (and labels too I guess...)

Values are characterized by properties that define what one can do with 
the value. The most important of these properties is the type. D offers 
a very rich mechanism to query and manipulate types (typeof(..), is(..) 
expression, auto, type parameters, etc.), much better than any other 
statically typed language I know. But the type is not the only 
"property" of an expression. There are others, such as whether an 
expression is an lvalue or not, if it can be assigned, etc.. The problem 
so far is that D does not offer a good mechanism to query and manipulate 
such "type properties".

Let's consider the following type properties:

R: is the value readable.
C: is the value readable at compile time (compile time constant).
&: is the value referenciable.
W: is the value writable. (explained later)

I can't find a good term for these "extended type properties" or 
"extended value properties" so I'll just call it QUX for now. Silly, I 
know, but whatever. And I will call "core type" to the current notion of 
type, which is what typeof(..) returns.

So, with these QUX, what are the valid combinations of them?
They are:

R
CR
R&
RW&
W&

(I'm skipping the explanation of why, since I think that's clear, see 
examples below). Furthermore, these property combinations are related in 
the following hierarchy of conversion:

   R
  /  \
CR  R&   W&
      \  /
       RW&

Examples of the QUX for various values in current D:

42	// CR - A literal is a constant.
var	// RW& - As in:  int var;
fvar	// R& - As in:  const fvar; fvar = 2; It's like 'final'
func()	// R - the value of a function is readonly and not referenciable
	// W - No example in current D for W

Let's give some tentative keywords for the possible QUX combinations:

CR  - const
RW& - ref
W&  - wronly ref
R&  - rdonly ref
R   - rdonly

Now, recall the following: QUX describe the properties of a value. The 
first thing you may think now is that one can use QUX to declare 
variables of the same QUX type. That's not entirely accurate. For 
example you can't declare a variable of QUX R, because a var is always 
either referenciable, or a (compile-time) constant. There are no R 
variables.
Specifying a var as rdonly will create a R&. Specifying no QUX will 
create a RW&. Specifying ref in a var declaration will alse create a RW& 
var, but the identity will be the same as the var in the given 
initializer. ref will preserve the "reference" (memory location) of the 
initializer. This is mentioned to clear the declaration semantics.
Examples:

int varA = varX; 	// var is RW&
const int varB = 1;	// var is CR
rdonly int varC = 2;	// var is R&
rdonly ref int varD = &varX; // var is RW& too
wronly ref int varE = &varX; // var is W&

varA will be a copy of varX, while varD will be the same as varX (same 
identity). After definition, varA and varD will have the same QUX.

And what about the definitions of function parameters, and function 
return types?
There are some minor differences. In function return types, if no QUX is 
specified, then the QUX is rdonly (as it is currently in D). In function 
parameters, if no QUX is specified the QUX is rdonly too (this is 
different from current D, but is considered a nice improvement. Function 
arguments are almost never modified anyway).

What about composing types? That is, when one has a composite type 
(array, pointers, etc.), how does one specify QUX for each of the type 
components?

Well for this var:
   int[]* var;
then QUX are specified like this:

rdonly int[]* var; // var (the pointer) is rdonly
(rdonly int[])* var; // the pointer target (the array) is rdonly
(rdonly int)[]* var; // the members of the array are rdonly
rdonly (rdonly (rdonly int)[])* var; // All are rdonly

Note that some QUX don't make sense in certain declarations, like 
declaring an array member as ref, like this:
   (ref int)[] var;
because the members of arrays are refs already. This could be an error 
or simply ignored.

What about auto?
auto does not in any way capture the QUX, just the core type (as in 
typeof(..) ).
   rdonly var;
   auto foo = var;  // foo is not rdonly

How do we templatize and parameterize QUX?
Let's see by example, looking at previous design challenges:

The id function:

T id(expr T) (T a) {
   return a;
}

So, "expr T" denotes that T is not a normal type parameter, but an 
"extended type parameter". Besides the core type, it will also hold 
information about the QUX. id can be instanced manually or with IFTI.

The max function (challenge #3) will show more advanced scenarios of QUX 
manipulation, but let's first recall what max does.
Consider these vars:

a = 3;
b = 9;
const fvar;  // fvar is 'final'
fvar = 1;

And now some examples of max usage:

max(1, 2)	2 of QUX CR
max(a, 2)	a of QUX R
max(a, b)	b of QUX RW&
max(a, func())	a of QUX R
max(a, fvar)	a of QUX R&

As requested, max preservers the greatest common QUX information.
Here's how we define max:

maxExtType!(A,B) max(expr A :: rdonly, expr B :: rdonly) (A a, B b) {
   if(a >= b)
     return a;
   else
     return b;
}

Of note: The 'T :: U' syntax means specialize the template if T can be 
converted to U. This is a variant of the current 'T : U' syntax which 
means specialize if T is the *same* as U. In both these constructs, U 
can be a QUX, but only if T is an "extended type parameter" (a parameter 
declared with expr).

maxExtType is the key to complete the challenge. It defines the maximum 
common extended type of A and B. This is defined as:

template maxExtType(expr A, expr B) {
   static if( !is(typeof(A) == typeof(B)) ) {
     alias maxCoreType(A, B) maxType; // Type cannot be ref
   } else {
     static if( is(A == ref) && is(B == ref))
       alias (ref typeof(A)) maxType;
     else static if( is(A :: rdonly ref) && is(B :: rdonly ref))
       alias (rdonly ref typeof(A)) maxType;
     else static if( is(A :: wronly ref) && is(B :: wronly ref) )
       alias (wronly typeof(A)) maxType;
     else static if( is(A :: rdonly) && is(B :: rdonly) )
       alias (rdonly typeof(A)) maxType;
     else
       static assert(false, "No common extended type for:"+A+" and "+B);
   }
}

So, like mentioned in the original challenge thread, if the core type of 
A and B are not the same, then maxExtType cannot be a ref. That's what 
the first static if checks for (note: an exception can be made for 
object types). The subsequent static ifs check for increasingly less 
restrictive common QUXs. It's possible that a common QUX does not exist 
if one is R& and the other is W& for example.

What about lazy?
In this design, lazy simply isn't considered as a QUX, as it simply is 
not a property of an expression. There are no lazy expressions. After a 
lazy FOO variable is created (which must be initialized), the variable 
becomes for *all effects* indistinguishable from a FOO delegate() , that 
is, a delegate returning type FOO. Thus, lazy can't also be 
parameterized/templatized.

IMMUTABILITY

Immutability, as in, "transitive immutability" is achieved as a type 
modifier with the keyword "immut". An immut value means that any other 
member obtained from the original value cannot be modified, and so on. 
The members of immut values are rdonly and immut. immut is not a QUX, it 
is a type modifier that modifies (and is part of) the core type. This 
means that immut appears in "typeof(..)", and consequentely is also 
captured by auto. This is the only sensible behavior, since immut 
describes a property of the referenced data of that expression, and must 
be preserved upon assignments (and thus part of the core type). This is 
unlike QUX, since QUX only describe properties of the immediate-value of 
an expression, which is copied in assignments. I.e., you can assign a 
rdonly value to a non-rdonly var, but you can't assign an immut value to 
a non-immut (normal) var. This shows how immut and QUX are somewhat 
different in nature. Also immut vars and not automatically rdonly, they 
are rdonly only if 'rdonly' is also specified.
An example:

   immut Foo[] fooar;
then:
   typeof(fooar[0]) == rdonly immut Foo;

TODO
*Syntax to specify the "this" of a method as immut. Maybe do it like C++?
*A way to conveniently specify/templatize methods that are identical and 
only vary in the mutability of it's types (like 'romaybe' in Javari).

The following describes a particular use case for rdonly and wronly:

VARIANT COMPOSITE TYPES.

Consider this hierarchy:

FooBar extends Foo extends Object
Xpto extends Object

Suppose we have an array of Foo:
   Foo[] fooarr;

The classic contravariance problem is, is fooarr safely castable to 
Object[] ? On first sight one might think yes, since Foo is an Object, 
then Foo[] is an Object[]. But that is not the case since an Object[] 
array is an array that one can put an Xpto object into:
   (cast(Object[]) fooarr)[0] = new Xpto();
which would break type safety, since we would have a Xpto in an array of 
Foo's. What happens is that we have some array operations (like readers) 
that remain safe , but others do not (like writers).
Java allows that cast, but has runtime checks on array member 
assignments, and throws an exception if the type safety is violated like 
in the example above.
Can a language provide (compile time) support for safe casting? With 
rdonly and wronly it can.
   We have that Foo[] cannot be cast to Object[], but it can be safely 
cast to (rdonly Object)[]. And then:

   fooarr2 = cast((rdonly Object)[]) fooarr;
   fooarr2[0] = new Xpto(); // Not allowed
   fooarr2[0].doSomething(); // Allowed

Assignments won't be allowed, but reading is allowed. Conversely, the 
array type parameter can be contravariantly cast, from Foo[] to (wronly 
Foobar)[].

   // Ok because fooarr[] is of type wronly FooBar
   fooarr[0] = new FooBar();
   // Not allowed because you can't read fooarr[0];
   fooarr[0].doSomething();

This was the main motivation I saw for the use of wronly, however, this 
mechanism is quite simple (as in, simplistic) and limited. It's not as 
powerful as Java's generics, which allow a greater degree of 
functionality with lower-bounded types. As such, it may not be worth 
having wronly just because of this. Still, I guess wronly could also be 
used in place of 'out' parameters.

SYNTAX AND TERMS SUBJECT TO CHANGE:

QUX - EVT (Extended Value Properties) or ETP (Extended Type Properties) 
? Or 'attributes' instead of 'properties' ? But definitely not "storage 
class", that term sucks. :o
immut  - perhaps 'immutable'?
rdonly - perhaps 'readonly' or 'final'?
wronly - perhaps 'writeonly'?
expr
"::" - Ideally, it would be better that the ":" of template 
specialization would behave the same as the ":" of the is(..) expression.

Comments are welcome.

-- 
Bruno Medeiros - MSc in CS/E student
http://www.prowiki.org/wiki4d/wiki.cgi?BrunoMedeiros#D