DMD 0.177 release [Length in slice expressions]
Andrei Alexandrescu (See Website For Email)
SeeWebsiteForEmail at erdani.org
Thu Dec 21 02:18:33 PST 2006
Derek Parnell wrote:
> On Wed, 20 Dec 2006 06:24:28 -0800, Andrei Alexandrescu (See Website For
> Email) wrote:
>
>
>> A small book could be written on just how bad language design is using
>> "length" and "$" to capture slice size inside a slice expression. I
>> managed to write two lengthy emails to Walter about them, and just
>> barely got started.
>
> Please share your thoughts here if you can too.
Gladly; I dug my email and let me share a couple of excerpts.
---------
int length = 5;
int[] a = new int[length * 2];
int[] b = a[length .. length * 2];
int c = a[length - 1 .. (b[0 .. length])[0]);
In each of its uses, length has a different semantics. The behavior is
well-defined for all cases, but nonintuitive and about as pleasant as
nails on the blackboard.
Now D has a compile-time option to ban the "length" name in scopes in
which the slice operator is used. That would render the example above
illegal. There is also a rule that identifiers in nested scopes cannot
mask one another. So length will be banned from *any* scope that nests a
scope using a slice:
int length;
if (a) {
foreach (b; c) {
while (d) {
switch (e) {
case f: g = h[0 .. length - 1];
...
}
}
}
}
This code will not compile. Worse, it *will* compile until you add the
slice operation. Combining the two rules and taking them to their
logical conclusion, any code using "length" is frail because there's
always a risk that somebody might insert a slice, rendering the entire
function uncompilable. What happened is that now "length" has become a
backdoor-introduced keyword. Books will advise users to never use it
even when it works, coding standards will ban it, language lawyers will
use it to detract D, and users of other languages will smile
condescendingly and stay with their languages.
There are a few ways out of it. "length" could be actually made a
keyword. But even that one isn't very uniform, and steals yet another
good identifier name.
Another way out of it is to ban "length" but stick with "$". But "$" has
another bunch of problems. It's a special character used only once, and
only in a very particular situation. There is no general concept
standing behind its usage: it sticks out like a sore thumb. "$" isn't
the last index in an array. It's that only when used inside a slice, and
refers only to the innermost index of the array. Quite a waste of a
special character out there, and to little usefulness.
But if we made "$" into an operator identifying the last element of
_any_ array, which could refer to the last element of _the left-hand
side_ array if we so want, then all of a sudden it becomes useful in a
myriad of situations:
int i = a[$ - 1]; // get last element
int i = a[$b - 1]; // get a's element at position b.length - 1
if (a[$ - 1] == x) { ... }
if ($a > 0) { ... }
if ($a == $b) { ... }
swap(a[0], a[$ - 1]); // swap first and last element
---------------
Grammar for nullary/unary $:
---------------
I think I nailed down the way the count operator $ can work in a manner
that's terse, expressive, and safe.
My basic goal is to enable the operator $ to be unary (applying to an
array) to return its size, and also nullary (applying to nothing) to
implicitly mean "fetch the size of the innermost array in the
expression". So this code should work:
int[] foo;
foo[$ - 1]; // refers to foo's last element
foo[$foo - 1]; // same
int[][] bar;
bar[foo[$]]; // refers to bar indexed with foo's last element
bar[foo[$bar]]; // refers to bar indexed with foo's element at $bar
To insert my operator $ within D's grammar, go to the grammar page:
http://www.digitalmars.com/d/expression.html$UnaryExpression and scroll
down to Unary Expression. There, add the following rules:
UnaryExpression:
PostfixExpression
& UnaryExpression
... etc. etc. ...
$ Identifier
$ PostfixExpression . Identifier
$ PostfixExpression ( )
$ PostfixExpression ( ArgumentList )
$ IndexExpression
$ SliceExpression
$ ArrayLiteral
$ ( Expression )
Now a unary expression can be the $ operator followed by an identifier,
a member access, a function call, an array access, or a slice expression
(awesome! pick the size of the slice!), a literal array (for
conformity), or a parenthesized expression. Perfect!
But we haven't yet filled the role of $ as a nullary operator. To do so,
let's go in the grammar to
http://www.digitalmars.com/d/expression.html$PrimaryExpression and
append one more rule to it the PrimaryExpression rule:
PrimaryExpression:
Identifier
.Identifier
... etc. etc. ...
$
Now the grammar is unambiguous and will properly distinguish unary and
nullary uses of the $ operator.
This is more elegant than the current crap with "$" and "length" popping
up. Besides, you can now use $ in many more places than inside []s.
However, the grammar size does increase quite a bit, which is more fuss
than I hoped for just one operator.
A simpler grammar would have been to simply allow:
UnaryExpression:
PostfixExpression
& UnaryExpression
... etc. etc. ...
$ PostfixExpression
But this would have been ambiguous. If the compiler sees "$-1", then the
bad grammar says that's a unary use of $ because -1 is a
PostfixExpression. But that's not what we wanted! We wanted $ to be
nullary. That's why I needed to put all the cases in UnaryExpression.
Andrei
More information about the Digitalmars-d-announce
mailing list