templating opEquals/opCmp (e.g. for DSL/expression templates)

Wed Feb 13 18:42:43 UTC 2019

On Wed, Feb 13, 2019 at 01:50:21AM +0000, Nicholas Wilson via Digitalmars-d wrote:
> On Wednesday, 13 February 2019 at 00:56:48 UTC, H. S. Teoh wrote:
> > Frankly, I think it's a very good thing that in D comparison
> > operators are confined to opEquals/opCmp.
> 
> So do I. Many more objects have partial ordering than arithmetic,
> having opCmp under opBinary would be annoying.

Actually, contrary to what Andrei has claimed in the past, opCmp (as
currently implemented) does NOT allow for a consistent definition of a
partial ordering.  I know because I've tried to do it with an integer
set type.  The problem is that when opCmp returns 0, it's ambiguous
whether it means "incomparable" or "equal".  This makes it impossible to
make <= equivalent to the is-subset predicate. For example, here's a
prospective implementation:

	struct IntSet {
		Impl impl;
		int opCmp(in IntSet b) {
			bool isSubset = impl.isSubsetOf(b.impl);
			bool isSuperset = b.impl.isSubsetOf(impl);

			if (isSubset && isSuperset)
				return 0;
			else if (isSubset)
				return -1;
			else if (isSuperset)
				return 1;
			else
				return 0; // incomparable
		}
		bool opEquals(in IntSet b) {
			return impl.isSubsetOf(b.impl) &&
				b.impl.isSubsetOf(impl);
		}
	}

Efficiency concerns aside (should not need to compute two-way subset
relation just to determine <=, for example), this looks good, right?

Nope.  Suppose s is a subset of t.  Then opCmp would return -1, and the
predicate s <= t would be true, because s <= t lowers to:

	s.opCmp(t) <= 0

Now suppose s and t are disjoint (i.e., incomparable). According to the
spec, opCmp should return 0 in this case.  But then:

	assert(!(s <= t));

fails, because opCmp returns 0.  So we cannot distinguish between s
being a subset of t vs. s and t being incomparable by using the <=
operator.  Similarly for >=.  Therefore, s <= t cannot represent the
is-subset operation.

We can try to redefine opCmp to return something different from what's
outlined above, but we'd run into other problems.

tl;dr: opCmp does NOT support partial orderings in any real sense (you
have to accept that <= and >= are true for incomparable elements!). Only
linear orderings work correctly in all cases.

> > If anything, I would vote for enforcing opEquals to return bool and
> > bool only.
> 
> That would be a backwards incompatible change, like it or not.

I know.  That's why I haven't proposed it yet. :-D

> > For example, recently I came up with a C++ monstrosity where the
> > lines:
> > 
> >         fun<A, B>(a, b);
> >         gun<T, U>(a, b);
> > 
> > have wildly-different, completely unrelated *parse trees*, as an
> > illustration of how unreadable C++ can become.
[...]
> Thank goodness we use ! for template and don't have `,` available for
> overloading!

Yeah, I was very happy when we finally deprecated the evil comma
operator. Good riddance.

> > Operator overloading should be reserved for objects that behave like
> > arithmetical entities in some way.
> 
> Like symbolic math.

I know what you're trying to drive at, but there's a difference
arithmetic in code vs. arithmetic in math. An expression in code is
generally assumed to execute when control flow reaches that line of
code. It's unexpected behaviour for what looks like an assignment of the
result of an expression to instead assign the expression itself as
something to be evaluated later.

> > And especially comparison operators should not have any other
> > meaning than the standard meaning.  It should be illegal to make ==
> > and <= mean something completely unrelated to each other.
> 
> They do already mean something completely different, <= is an
> ordering, == is equality. Yes it would be bad for (a <= b) == (a == b)
> to be false.  I'm sure you could already achieve that outcome, would
> you though? Of course not, it'd be stupid.

What I meant was that == and <= should behave consistently with each
other.  Ideally both <= and == should be handled by the same function.
Then consistency would be guaranteed. But for practical reasons we
separate them, one reason being that often equality is much cheaper to
compute than linear ordering.

[...]
> > The usual invocation of such a DSL as a compile-time argument, say
> > something like this:
> > 
> > 	myDsl!'a <= b'
> > 
> > contains one often overlooked, but very important element: the
> > identifier `myDsl`, that sets it apart from other uses of `<=`, and
> > clearly identifies which interpretation should be ascribed to the
> > symbols found in the template argument.
> 
> Its also much uglier and does not commute with things that use <= i.e.
> generic functions.

Ugliness is debatable; I find that the identifier serves as a good
marker to indicate departure from usual semantics.

As for not commuting with generic functions, do you have a specific
example in mind?  Because all the obvious (to me) examples I can think
of are cases where I think would be better off *not* working with
generic code, because the generic code expects one set of semantics but
overloading opCmp/opEquals to return arbitrary objects produce a
different set of semantics.

[...]
> > Similarly, it should not be the case that:
> > 
> > 	auto x = a <= b;
> > 
> > evaluates a comparison expression and assigns a boolean value to x,
> > whereas:
> > 
> > 	auto y = p <= q;
> > 
> > creates an expression object capturing p and q, that needs to be
> > called later before it yields a boolean value.
> 
> No. auto y = p <= q; should not e.g. open a socket (you could probably
> do that already with an impure opCmp). Being able to capture the
> expression `p <= q` is the _entire point_ of the proposal.

And that's the point I disagree on.  The current expectation of a line
of code like `auto y = p <= q;` is that typeof(y) == bool.  For it to
return a type other than bool is unexpected behaviour, and leads to
subtle differences in semantics that will likely result in bugs.

Explicitly marking it, e.g., as `auto y = symbolic!"p <= q";` makes it
much more obvious what's really going on with the code.  Yes it's ugly,
but at the same time this ugliness is IMO necessary warning for the
would-be code maintainer to understand that the semantics depart from
the usual expectations.

[...]
> > P.S. And as a bonus, a string DSL gives you the freedom to employ
> > operators not found among the built-in D operators, for example:
> > 
> > 	auto result = eval!`(f∘g)(√x ± √y)`;
> > 
> > And if you feel the usual strings literals are too cumbersome to use
> > for long expressions, there's always the under-used token strings to
> > the rescue:
> > 
> > 	auto result = eval!q{
> > 		( (f ∘ g)(√(x + y) ± √(x - y)) ) / |x|·|y| +
> > 		2.0 * ∫ f(x)·dx
> > 	};
> > 
> > The `eval` tells the reader that something special is happening
> > here, and also provides a name by which the reader can search for
> > the definition of the template that processes this expression, and
> > thereby learn what it means.
> > 
> > Without this little identifier `eval`, it would be anyone's guess as
> > to what the code is trying to do.
> 
> I would expect it the compute the tuple
> (f(g(sqrt(x+y)) + sqrt(x-y)/(abs(x).dot(abs(y)) + 2*integrate(f),
> (f(g(sqrt(x+y)) - sqrt(x+y)/(abs(x).dot(abs(y)) + 2*integrate(f)
> 
> (you missed the bounds on the integral and x is ambiguous in the integral)
> 
> What else would I think it would do? If that guess is wrong then the person
> has abused the operators, if its correct that thats a win. I'd bet money you
> could do just that in Julia. I'm not suggesting we go that far but they
> would definitely consider that a feature.
[...]

My point was that there are no such operators as ∘ or ± in D, so even if
you had free-for-all operator overloading you couldn't implement such an
expression.  Using a DSL frees you from that constraint.

And because there are no such operators in D, they have no standard
meaning defined by the language, so different authors likely have
different definitions for them. So even in the case where you *could*
add arbitrary operators to the language, you'll end up with a mess as
soon as your code imports two libraries that happen to overload some of
the same custom operators -- it would be hard to tell which set of
semantics apply to the operators in any given piece of code; you'd have
to study the context to be sure.  Having the `eval` or whatever else
prefixed identifier in front of the DSL resolves that question
instantly.

T

-- 
In a world without fences, who needs Windows and Gates? -- Christian Surchi