Could we reserve void[T] for builtin set of T ?
Q. Schroll via Digitalmars-d
digitalmars-d at puremagic.com
Fri Apr 1 01:52:40 PDT 2016
On Thursday, 31 March 2016 at 19:57:50 UTC, Walter Bright wrote:
> aa[x] = true; // add member x
> aa[x] = false; // remove member x
> x in aa; // compile error
On Friday, 1 April 2016 at 02:36:35 UTC, Jonathan M Davis wrote:
> Still, while it's true that aa.remove is how you'd normally do
> it, I think that Walter's suggestion of assigning true or false
> makes by far the most sense of the ones made thus far - and you
> could just make
>
> aa.remove(key);
>
> and
>
> aa[key] = false;
>
> equivalent for void[T] to make it more consistent.
>
> - Jonathan M Davis
The basic idea is great. But how could the details look like?
What do we want in detail?
We want simple and suggestive operations on sets, like
modification of single elements, iteration, union, intersection,
(relative) complement, cartesian product, pointwise function
application, filtering and much more. Maybe even suggestive set
comprehension is possible.
About the special case:
We encounter (yet another) special case for something built of
void. This is not a major deal. Everyone knows (should know) that
void is a special case nearly everywhere it emerges:
• void returning functions.
• void* is much different from any other pointer.
• void[] and void[n] are different from usual arrays.
Now we add
• void[T] is different from K[T] (for K != void).
From the view of a D programmer, this is not a big deal to accept.
From the view of a D learner, this is yet another void special
case, maybe even easier to fully understand than void*.
First of all, we do not use the term associative array for sets.
This is wrong and confusing at least for the beginners.
We can have AA declaration syntax without AA indexing syntax. The
set indexing syntax will be a bit different.
void[T] s; // declare s as a set of T
s.add(x); // add x of type T
s.remove(x); // remove x of type T
auto r1 = x in s; // r1 is of type bool; r1 == true iff x is
--- in s.
auto r2 = x !in s; // r2 is of type bool; r2 == true iff x is
not in s.
Further we allow not only for convenience
s[x] = true; // Same side effect of add.
s[x] = false; // Same side effect of remove.
This is not AA indexing syntax so let's call it set indexing and
nothing else. It looks similar to bool[T] syntax, but a void[T]
is different from bool[T] by design and idea.
The methods add and remove return bool values that indicate the
state being changed:
• add returns true iff key has not been already present.
• remove returns true iff key has been already present.
The opIndexAssign should return the assigned bool; everything
else would be totally unexpected. It is a bit like assigning the
expression
x in s
which is an rvalue.
It is illegal to use the opIndex with parameters. The only legal
indexing expressions will be
• s[]: legally used to operate on the set pointwise (see later).
• s[x] = b
• s[x] op= b, where op is one of |, &, ^: s[x] op= b does s[x] =
(x in s) op b.
I don't see an application of the latter, but there is no reason
to disallow it; rather discourage it.
When assigning bool literals with set indexing syntax where the
value of the assignment is not used, the compiler should emit a
warning and suggest using add or remove respectively.
bool b = expression();
s[x] = true; // compiler suggests using add.
s[x] = false; // compiler suggests using remove.
s[x] = b; // ok, b is not a literal.
s[x] = expression(); // ok.
s[x] = s[y] = true; // ok; for s[y] = true, the value (true)
is being used,
// for s[x] we have s[y] = true as
expression.
Known from AAs we also will have
• sizeof
• length
• dup
• rehash
• clear
in the expected form, but we won't have
• keys
• values
• byKey()
• byValue()
That is because to be it is odd to call the elements keys. And
what are the values then?
We don't even need these.
New/changed ones:
• singelton (new)
• get (known from AA, but other sematics)
For a singelton set, singelton returns the only element.
Otherwise RangeError.
get returns a pointer to the only element of a singelton set or
null for the empty set. If the set contains more than one
element, RangeError. get can be useful with
if (auto xp = s.get)
{ /+ use the unique element x = *p +/ }
else
{ /+ empty set handling +/ }
[Aside: There is no add for AAs for a good reason.]
Iteration:
foreach (ref x; s) // ref is optional!
{ ... }
Because the pseudo index type is void, there are no index values.
So indexed iteration is illegal.
foreach (i, ref x; s) // compile-time error.
{ ... }
Sets cannot be lockstepped over; that would need canoncial order.
But it makes sense to chain sets etc.
Initialization:
The set can also be initialized by a value of T, T[] or bool[T].
void[T] s; // makes empty set.
void[T] s = x; // x of type T: Makes
singleton set.
void[T] s = [ a, b ]; // x in s iff x == a or x ==
b
void[T] s = [ a:true, b:false ]; // x in s iff x == a.
If a set is initialized by an AA literal with only literal bools,
the compiler should suggest using an array instead.
There is no need for a special set literal. We could allow
void[T] s = [ a:, b: ];
The colon is from the AA literal syntax, with no value to the key
as void has no values.
We could allow assigning any AA and taking the keys only. That
would make the usage of bool[T] inconsistent. It is better to use
void[T] s = aa.keys; // if aa of type S[T]
void[T] s = aa.values; // if aa of type T[S]
We could allow further
s[x, y] = b; // nothing different from s[x] = s[y] = b;
and
void[T] t
s[t] = b; // foreach (x; t) s[x] = b;
Set operations (oh, no... maths):
void[int] s = [ 0, 1 ];
void[int] t = [ 0, 2 ];
void[int] u = [ 0, 1, 2 ]; // s ∪ t
void[int] i = [ 0 ]; // s ∩ t
void[int] d = [ 1, 2 ]; // s Δ t
void[int] m1 = [ 1 ]; // s \ t
void[int] m2 = [ 2 ]; // t \ s
assert (s | t == u); // union
assert (s & t == i); // intersection
assert (s ^ t == d); // symmetric difference
assert (s - t == m1); // relative complement
assert (s / t == m2); // reverse relative complement
assert (s ~ t == u); // union (alias); compiler should
suggest using |.
Rationale for using bit/logic operators? Let op be one of |, & or
^. Then
x in (s op t) == (x in s) op (x in t)
This is very consistent with assigning bools in indexing
expressions.
Rationale for better not using ~ for sets at all: Sets are
different from arrays by idea.
It is fundamental for arrays that
int[] a, b;
assert ((a ~ b).length == a.length + b.length);
holds. For sets it does not.
Maybe this is interesting:
• s[t] = true does s |= t.
• s[t] = false does s -= t.
[I have thought about the disjoint union of sets. One problem is
the two meanings of this term. There seems not to be a perfect
solution to this issue.]
Union and intersection:
void[void[T]] ss; // set of sets.
auto u = join(ss); // u is of type void[T]
auto i = intersect(ss); // i is of type void[T]
Also we have disjoint, that returns if either a single set of
sets has disjoint elements or when given more than one parameter,
tests if the given sets are disjoint.
auto r1 = disjoint(ss); // set of sets
void[S] s;
void[T] t;
// Assume the comapaison x == y is legal for S x; T y;
auto r2 = disjoint(s, t); // some sets
Pointwise application of a function:
In math there is f[s] for { f(x) | x in s }. We cannot use this
great notation directly. What can be done is
auto r = pointwise(&f, s); // f.pointwise(s);
and/or
auto r = pointwise(s, &f); // s.pointwise(f);
It is possible to distinguish sets and functions perfectly.
The option s[f] is open, but looks very odd.
Another way to go is
struct pointwise(alias f)
if (!is(Parameters!f == AliasSeq!()))
{
alias T = Parameters!f[0];
alias Args = Parameters!f[1 .. $];
static opCall(T x, Args args)
{
return f(x, args);
}
static opIndex(void[T] s, Args args)
{
alias R = typeof(f(s.singelton));
void[R] result;
foreach (x; s)
result.add(f(s, args));
return result;
}
}
alias f = pointwise!fBase;
f(x, args); // simple application
f[s, args]; // pointwise application on a set.
Further we should provide a mechanism for f[[S]] = { f[s] | s in
S }, i.e.
alias f = pointwise!(pointwise!fBase);
f(x, args); // simple application
f[ss, args]; // pointwise application on a set of sets.
// pointwise application on a set not possible via f.
The main problem is that a function can be overloaded with
R f(void[T] s, Args)
and
R f( T x, Args)
It is possible but not desirable to make pointwise detect these
cases and forward opIndex to opCall. For this, there can be a
separate solution.
We can have pointwise operator application rather easily.
void[S] s;
void[T] t;
for any overloadable unary operator op: op s[] is like { op x | x
in s }
alias R = typeof(op s.singelton);
void[R] r;
foreach (x; s)
r.add(op x);
for any overloadable binary operator op: s[] op t[] is like { x
op y | x in s, y in t },
alias R = typeof(s.singelton op t.singelton);
void[R] r;
foreach (x; s)
foreach (y; t)
r.add(x op y);
These can be computed lazily under certain circumstands, e.g. if
the sets are immutable or in
foreach (x; s[] + t[])
{ ... }
if s and t are not changed in the foreach body.
We can allow filtering a set by condition via
auto t = s(predicate); // typeof(t) == typeof(s)
Set comprehension:
The mathematical notation is not unambiguous and partly informal.
The minimal support of set comprehension is
{ f(x1,..., xn) | x1 in s1,..., xn in sn,
predicate1(x1,..., xn) ... predicatem(x1,..., xn) }
We have an expression on the left, bindings and predicates on the
right. Maybe one way is a template function
setComp!("f(x1, ..., xn)",
"x1 in s1, ..., xn in sn",
"predicate1(x1, ..., xn) ... predicatem(x1, ...,
xn)")
with predicates optional.
One thing I have left out? Yes, the complement.
void[T] s;
void[T] t = ~s;
We can decide, wether the complement is referentially bound to
the original or not. This is an issue but not the major one.
Let's say no, use a copy and go on.
As some types are infinite, we must simulate complements. The
easiest one is having a private bool isComplement to indicate
wether the content semantically is the set or the complement. For
a complement, the operations
• x in s
• s[x] = b
• s | t, s & t, s ^ t, etc.
can be emulated perfectly.
The great issue is iteration over complements. One solution is
making complements another type, because being able to iterate a
set just sometimes is inacceptable. For some types of sets, like
void[bool] which has the four values [ ], [ false ], [ true ], [
false, true ], complements are easy. This holds for all finite
types, but is impracticable for lange ones like long. It makes
sense to have special treatment for small enums. In addition, one
of D's goals is to fit user defined types best into the language.
We need an interface for user defined types that tell that the
complement operation on sets of them is a meaningful thing and
not just a coated set.
More information about the Digitalmars-d
mailing list