Could we reserve void[T] for builtin set of T ?

Fri Apr 1 01:52:40 PDT 2016

On Thursday, 31 March 2016 at 19:57:50 UTC, Walter Bright wrote:
> aa[x] = true;  // add member x
> aa[x] = false; // remove member x
> x in aa; // compile error

On Friday, 1 April 2016 at 02:36:35 UTC, Jonathan M Davis wrote:
> Still, while it's true that aa.remove is how you'd normally do 
> it, I think that Walter's suggestion of assigning true or false 
> makes by far the most sense of the ones made thus far - and you 
> could just make
>
> aa.remove(key);
>
> and
>
> aa[key] = false;
>
> equivalent for void[T] to make it more consistent.
>
> - Jonathan M Davis

The basic idea is great. But how could the details look like? 
What do we want in detail?

We want simple and suggestive operations on sets, like 
modification of single elements, iteration, union, intersection, 
(relative) complement, cartesian product, pointwise function 
application, filtering and much more. Maybe even suggestive set 
comprehension is possible.

About the special case:

We encounter (yet another) special case for something built of 
void. This is not a major deal. Everyone knows (should know) that 
void is a special case nearly everywhere it emerges:
  • void returning functions.
  • void* is much different from any other pointer.
  • void[] and void[n] are different from usual arrays.
Now we add
  • void[T] is different from K[T] (for K != void).

 From the view of a D programmer, this is not a big deal to accept.
 From the view of a D learner, this is yet another void special 
case, maybe even easier to fully understand than void*.

First of all, we do not use the term associative array for sets. 
This is wrong and confusing at least for the beginners.

We can have AA declaration syntax without AA indexing syntax. The 
set indexing syntax will be a bit different.

     void[T] s; // declare s as a set of T

     s.add(x);    // add    x of type T
     s.remove(x); // remove x of type T

     auto r1 = x  in s; // r1 is of type bool; r1 == true iff x is 
--- in s.
     auto r2 = x !in s; // r2 is of type bool; r2 == true iff x is 
not in s.

Further we allow not only for convenience
     s[x] = true;  // Same side effect of add.
     s[x] = false; // Same side effect of remove.

This is not AA indexing syntax so let's call it set indexing and 
nothing else. It looks similar to bool[T] syntax, but a void[T] 
is different from bool[T] by design and idea.

The methods add and remove return bool values that indicate the 
state being changed:
  • add    returns true iff key has not been already present.
  • remove returns true iff key has     been already present.

The opIndexAssign should return the assigned bool; everything 
else would be totally unexpected. It is a bit like assigning the 
expression
     x in s
which is an rvalue.

It is illegal to use the opIndex with parameters. The only legal 
indexing expressions will be
  • s[]: legally used to operate on the set pointwise (see later).
  • s[x] = b
  • s[x] op= b, where op is one of |, &, ^: s[x] op= b does s[x] = 
(x in s) op b.
I don't see an application of the latter, but there is no reason 
to disallow it; rather discourage it.

When assigning bool literals with set indexing syntax where the 
value of the assignment is not used, the compiler should emit a 
warning and suggest using add or remove respectively.

     bool b = expression();
     s[x] = true;         // compiler suggests using add.
     s[x] = false;        // compiler suggests using remove.
     s[x] = b;            // ok, b is not a literal.
     s[x] = expression(); // ok.
     s[x] = s[y] = true;  // ok; for s[y] = true, the value (true) 
is being used,
                          //     for s[x] we have s[y] = true as 
expression.

Known from AAs we also will have
  • sizeof
  • length
  • dup
  • rehash
  • clear
in the expected form, but we won't have
  • keys
  • values
  • byKey()
  • byValue()
That is because to be it is odd to call the elements keys. And 
what are the values then?
We don't even need these.

New/changed ones:
  • singelton (new)
  • get (known from AA, but other sematics)
For a singelton set, singelton returns the only element. 
Otherwise RangeError.
get returns a pointer to the only element of a singelton set or 
null for the empty set. If the set contains more than one 
element, RangeError. get can be useful with

     if (auto xp = s.get)
     { /+ use the unique element x = *p +/ }
     else
     { /+ empty set handling +/ }

[Aside: There is no add for AAs for a good reason.]

Iteration:

     foreach (ref x; s) // ref is optional!
     { ... }
Because the pseudo index type is void, there are no index values. 
So indexed iteration is illegal.
     foreach (i, ref x; s) // compile-time error.
     { ... }
Sets cannot be lockstepped over; that would need canoncial order. 
But it makes sense to chain sets etc.

Initialization:

The set can also be initialized by a value of T, T[] or bool[T].
     void[T] s;                       // makes empty set.
     void[T] s = x;                   // x of type T: Makes 
singleton set.
     void[T] s = [ a, b ];            // x in s iff x == a or x == 
b
     void[T] s = [ a:true, b:false ]; // x in s iff x == a.
If a set is initialized by an AA literal with only literal bools, 
the compiler should suggest using an array instead.
There is no need for a special set literal. We could allow
     void[T] s = [ a:, b: ];
The colon is from the AA literal syntax, with no value to the key 
as void has no values.
We could allow assigning any AA and taking the keys only. That 
would make the usage of bool[T] inconsistent. It is better to use
     void[T] s = aa.keys;   // if aa of type S[T]
     void[T] s = aa.values; // if aa of type T[S]

We could allow further
     s[x, y] = b; // nothing different from s[x] = s[y] = b;
and
     void[T] t
     s[t] = b; // foreach (x; t) s[x] = b;

Set operations (oh, no... maths):

     void[int] s  = [ 0, 1 ];
     void[int] t  = [ 0,    2 ];

     void[int] u  = [ 0, 1, 2 ];   // s ∪ t
     void[int] i  = [ 0 ];         // s ∩ t
     void[int] d  = [ 1, 2 ];      // s Δ t
     void[int] m1 = [ 1 ];         // s \ t
     void[int] m2 = [ 2 ];         // t \ s

     assert (s | t == u);   // union
     assert (s & t == i);   // intersection
     assert (s ^ t == d);   // symmetric difference
     assert (s - t == m1);  // relative complement
     assert (s / t == m2);  // reverse relative complement

     assert (s ~ t == u);   // union (alias); compiler should 
suggest using |.

Rationale for using bit/logic operators? Let op be one of |, & or 
^. Then
       x in (s op t) == (x in s) op (x in t)
This is very consistent with assigning bools in indexing 
expressions.

Rationale for better not using ~ for sets at all: Sets are 
different from arrays by idea.
It is fundamental for arrays that
     int[] a, b;
     assert ((a ~ b).length == a.length + b.length);
holds. For sets it does not.

Maybe this is interesting:
  • s[t] = true   does  s |= t.
  • s[t] = false  does  s -= t.

[I have thought about the disjoint union of sets. One problem is 
the two meanings of this term. There seems not to be a perfect 
solution to this issue.]

Union and intersection:

     void[void[T]] ss;       // set of sets.
     auto u = join(ss);      // u is of type void[T]
     auto i = intersect(ss); // i is of type void[T]
Also we have disjoint, that returns if either a single set of 
sets has disjoint elements or when given more than one parameter, 
tests if the given sets are disjoint.
     auto r1 = disjoint(ss); // set of sets
     void[S] s;
     void[T] t;
     // Assume the comapaison x == y is legal for S x; T y;
     auto r2 = disjoint(s, t); // some sets

Pointwise application of a function:

In math there is f[s] for { f(x) | x in s }. We cannot use this 
great notation directly. What can be done is
     auto r = pointwise(&f, s); // f.pointwise(s);
and/or
     auto r = pointwise(s, &f); // s.pointwise(f);

It is possible to distinguish sets and functions perfectly.

The option s[f] is open, but looks very odd.

Another way to go is
     struct pointwise(alias f)
     if (!is(Parameters!f == AliasSeq!()))
     {
         alias T    = Parameters!f[0];
         alias Args = Parameters!f[1 .. $];
         static opCall(T x, Args args)
         {
             return f(x, args);
         }
         static opIndex(void[T] s, Args args)
         {
             alias R = typeof(f(s.singelton));
             void[R] result;
             foreach (x; s)
                 result.add(f(s, args));
             return result;
         }
     }

     alias f = pointwise!fBase;
     f(x, args); // simple application
     f[s, args]; // pointwise application on a set.

Further we should provide a mechanism for f[[S]] = { f[s] | s in 
S }, i.e.
     alias f = pointwise!(pointwise!fBase);
     f(x, args); // simple application
     f[ss, args]; // pointwise application on a set of sets.
     // pointwise application on a set not possible via f.

The main problem is that a function can be overloaded with
     R f(void[T] s, Args)
and
     R f(     T  x, Args)
It is possible but not desirable to make pointwise detect these 
cases and forward opIndex to opCall. For this, there can be a 
separate solution.

We can have pointwise operator application rather easily.
     void[S] s;
     void[T] t;

for any overloadable unary operator op: op s[] is like { op x | x 
in s }
     alias R = typeof(op s.singelton);
     void[R] r;
     foreach (x; s)
         r.add(op x);

for any overloadable binary operator op: s[] op t[] is like { x 
op y | x in s, y in t },
     alias R = typeof(s.singelton op t.singelton);
     void[R] r;
     foreach (x; s)
     foreach (y; t)
         r.add(x op y);

These can be computed lazily under certain circumstands, e.g. if 
the sets are immutable or in
     foreach (x; s[] + t[])
     { ... }
if s and t are not changed in the foreach body.

We can allow filtering a set by condition via
     auto t = s(predicate); // typeof(t) == typeof(s)

Set comprehension:

The mathematical notation is not unambiguous and partly informal. 
The minimal support of set comprehension is
       { f(x1,..., xn) | x1 in s1,..., xn in sn, 
predicate1(x1,..., xn) ... predicatem(x1,..., xn) }
We have an expression on the left, bindings and predicates on the 
right. Maybe one way is a template function
     setComp!("f(x1, ..., xn)",
              "x1 in s1, ..., xn in sn",
              "predicate1(x1, ..., xn) ... predicatem(x1, ..., 
xn)")
with predicates optional.

One thing I have left out? Yes, the complement.
     void[T] s;
     void[T] t = ~s;
We can decide, wether the complement is referentially bound to 
the original or not. This is an issue but not the major one. 
Let's say no, use a copy and go on.
As some types are infinite, we must simulate complements. The 
easiest one is having a private bool isComplement to indicate 
wether the content semantically is the set or the complement. For 
a complement, the operations
  • x in s
  • s[x] = b
  • s | t, s & t, s ^ t, etc.
can be emulated perfectly.

The great issue is iteration over complements. One solution is 
making complements another type, because being able to iterate a 
set just sometimes is inacceptable. For some types of sets, like 
void[bool] which has the four values [ ], [ false ], [ true ], [ 
false, true ], complements are easy. This holds for all finite 
types, but is impracticable for lange ones like long. It makes 
sense to have special treatment for small enums. In addition, one 
of D's goals is to fit user defined types best into the language. 
We need an interface for user defined types that tell that the 
complement operation on sets of them is a meaningful thing and 
not just a coated set.