B Revzin - if const expr isn't broken (was Re: My Meeting C++ Keynote video is now available)

H. S. Teoh hsteoh at quickfur.ath.cx
Thu Jan 17 19:31:24 UTC 2019


On Thu, Jan 17, 2019 at 06:03:07PM +0000, Paul Backus via Digitalmars-d-announce wrote:
[...]
> [2]
> https://bartoszmilewski.com/2009/10/21/what-does-haskell-have-to-do-with-c/
[...]

Haha, seems D did better than C++ in this respect, but not quite at the
level of Haskell.

The C++ example of a template that takes templates and arguments and
declares another template is a perfect example of why C++ template
syntax is utterly horrible for doing these sorts of things.

Coming back to the D example at the end, I totally agree with the
sentiment that D templates, in spite of their significant improvements
over C++ syntax, ultimately still follow the same recursive model. Yes,
you can use CTFE to achieve the same thing at runtime, but it's not the
same thing, and CTFE cannot manipulate template argument lists (aka
AliasSeq aka whatever it is you call them).  This lack of symmetry
percolates down the entire template system, leading to the necessity of
the hack that Bartosz refers to.

Had template argument lists / AliasSeq been symmetric w.r.t. runtime
list manipulation, we would've been able to write a foreach loop that
manipulates the AliasSeq in the most readable way without needing to
resort to hacks or recursive templates.

//

Lately, as I've been pondering over these fundamental language design
issues, I've become more and more convinced that symmetry is the way to
go.  And by symmetry, I mean the mathematical sense of being "the same
under some given mapping (i.e., transformation or substitution)".

Why is C++ template syntax such a mess to work with?  Because it's a
separate set of syntax and idioms grafted onto the original core
language with little or no symmetry between them.  Where the core
language uses < and > as comparison operators, template syntax uses <
and > as delimiters. This asymmetry leads to all kinds of nastiness,
like earlier versions of C++ being unable to parse
`templateA<templateB<int>>` properly (the >> gets wrongly lexed as a
bitshift operator). An intervening space is required to work around this
asymmetry.  This is just one trivial example.

A more fundamental example, which also afflicts D, is that the template
instantiation mechanism is inherently recursive rather than iterative,
so when you need to write a loop, you have to paraphrase it as a
recursive template. This is asymmetric with the runtime part of the
language, where constructs like `foreach` are readily available to
express the desired semantics.

On a different note, the same principle of symmetry applies to built-in
types vs. user-defined types. In TDPL Andrei alludes to programmers
disliking built-in types having "magic" behaviour that's different from
user-defined types.  Why the dislike? Because of asymmetry. Built-in
types have special behaviour that cannot be duplicated by user-defined
types, so when you want the special behaviour but built-in types don't
quite meet your needs, you find yourself without any recourse. It is
frustrating because the reasoning goes "if built-in type X can have
magic behaviour B, why can't user-defined type Y have behaviour B too?"
The desire for behaviour B to be possible both for built-in types and
user-defined types stems from the desire for symmetry.

Why is alias this so powerful?  Because it lets a new type Y behave as
if it were an existing type X -- it's symmetry.  Similarly, the Liskov
Substitution Principle is essentially a statement of symmetry in the
universe of OO polymorphism.

Why is the Unix "everything is a file" abstraction so useful? Because of
symmetry: whether it's a physical file, a network socket, or pipe, it
exposes the same API. Code that works with the data don't have to care
about what kind of object it is; it can simply use the API that is
symmetric across different types of objects.

Similarly, why are D ranges so powerful? Because they make containers,
data sources, data generators, etc., symmetric under the range API
operations.  It allows code to be decoupled from the details of the
concrete types, and focus directly on the problem domain.

Why does the GC simplify many programming tasks so much? Because it
makes every memory-allocated object symmetric w.r.t. memory management:
you stop worrying about whether something is stack-allocated or
heap-allocated, whether it has cycles, or whether somebody else still
holds a reference to it -- you focus on the problem domain and let the
GC do its job.

At a higher level: in the old days, programming languages used to
distinguish between functions and procedures (and maybe some languages
still do, but they seem rare these days). But eventually this
distinction was ditched in favor of things like returning `void` (C,
C++, Java, D), or some other equivalent construct. Why? So that instead
of having two similar but asymmetric units of code encapsulation,
everything is just a "function" (it just so happens some functions don't
return a meaningful value). IOW, introduce symmetry, get rid of the
asymmetry.


On the flip side, when there is lack of symmetry, many problems arise.
I already mentioned C++ template syntax and how it's asymmetric w.r.t.
the imperative part of the language. The recursive nature of templates
in both C++ and D, as opposed to iterative runtime constructs like
foreach, is another example of asymmetry that leads to ugly / convoluted
code where the corresponding runtime code has no such issue.

Auto-decoding is another prime example of asymmetry causing problems:
all other arrays are ranges of their element type, but narrow strings
are not.  While it still does support the range API (and is symmetric in
that sense with other ranges), this internal asymmetry causes all sorts
of trouble: performance troubles, tons of special-case code (just look
at Phobos range algorithms and see how many special cases pertain to
narrow strings -- and how many bugs were caused in the interim),
introducing a difference between .count and .indexOf (it's the reason
for the very existence of .indexOf where .count would have sufficed, had
narrow strings been symmetric in range element type w.r.t. other
ranges).

There's also APIs that are hard to use because they are asymmetric to
other APIs that you happen to be using.  For example, if a hypothetical
D library sported an API that used .next and .empty instead of the
standard .empty, .front, .popFront, then the asymmetry with the range
API common throughout D code would cause all sorts of impedance mismatch
problems.  And one would naturally gravitate towards writing a wrapper
around this incompatible API that replicated the range API.  IOW, you
fix the problem by *introducing symmetry* where there was asymmetry.

A more minor point is the overloading of keywords to mean different
things, like `void` meaning "does not return" in one context, yet in a
different context `void*` means "can point to anything".  Or the various
overloaded meanings of `static` in D, which is a rat's nest of strange
exceptions and ad hoc semantics that have no real pattern, you just have
to separately learn what it means in each context.

Also, the asymmetry of function attributes (there is `pure`, but no
`impure`; there's `nothrow` but not `throwing`, you write @safe with a @
but pure doesn't have a @). A mostly minor syntactical point, but it
seems to keep coming up every time somebody new encounters them, and the
complaints seem disproportionate to the actual importance of the issue.
Why?  Asymmetry.  And in some cases, it leads to not-so-trivial issues
(which I won't repeat here). And the oft-proposed solution? Introduce
symmetry.

And operator overloading. In spite of all the problems associated with
operator overloading, it's still practically a necessity when you're
implementing arithmetical types. A matrix library in C requires very
ugly syntax to work with; in C++ and D, we naturally gravitate towards
operator overloading.  Why? Because we desire symmetry with built-in
arithmetic types. Sure I could write:

	result = prod(matrix1, sub(matrix2, matrix3));

but I would much rather write:

	result = matrix1 * (matrix2 - matrix3);


And why UFCS?  Because of symmetry: every subsequent operation on a
range can be written as syntactically top-level function call, rather
than the asymmetry of nesting every operation inside the previous one:

	myrange.map!(...)
		.filter!(...)
		.joiner
		.array;

vs.

	array(joiner(filter!(...)(map!(...)(myrange))));


I could go on and on.  But this principle of symmetry seems to me to
underlie many fundamental language design issues.  There seems to be a
general trend where the more symmetry there is, the smoother a language
feature will be and the less problems will be caused; whereas the less
symmetry there is, the more troubles, ugly workarounds, tendency towards
bugs, or just general unhappiness there will be.  I don't know if it's
possible for a programming language to be *completely* symmetrical --
maybe Lisp and Haskell come pretty close -- but the further you deviate
from symmetry, the more troubles you can expect.

As a corollary, with every new language change, it would be worthwhile
to evaluate how it affects the overall symmetry of the language. Does it
add more symmetry, or does it introduce asymmetry (syntactic, semantic,
etc.) w.r.t. existing features?  It seems to me that people rarely
consider this aspect of language design, esp. when they are trying to
fix what they perceive to be an important oversight in the language, but
IMO the issue of symmetry is certainly worth careful consideration
alongside the other usual factors.


T

-- 
English has the lovely word "defenestrate", meaning "to execute by throwing someone out a window", or more recently "to remove Windows from a computer and replace it with something useful". :-) -- John Cowan


More information about the Digitalmars-d-announce mailing list