DMD 0.177 release [Length in slice expressions]

Thu Dec 21 02:18:33 PST 2006

Derek Parnell wrote:
> On Wed, 20 Dec 2006 06:24:28 -0800, Andrei Alexandrescu (See Website For
> Email) wrote:
> 
>  
>> A small book could be written on just how bad language design is using 
>> "length" and "$" to capture slice size inside a slice expression. I 
>> managed to write two lengthy emails to Walter about them, and just 
>> barely got started. 
> 
> Please share your thoughts here if you can too.

Gladly; I dug my email and let me share a couple of excerpts.

---------

int length = 5;
int[] a = new int[length * 2];
int[] b = a[length .. length * 2];
int c = a[length - 1 .. (b[0 .. length])[0]);

In each of its uses, length has a different semantics. The behavior is 
well-defined for all cases, but nonintuitive and about as pleasant as 
nails on the blackboard.

Now D has a compile-time option to ban the "length" name in scopes in 
which the slice operator is used. That would render the example above 
illegal. There is also a rule that identifiers in nested scopes cannot 
mask one another. So length will be banned from *any* scope that nests a 
scope using a slice:

int length;
if (a) {
   foreach (b; c) {
     while (d) {
       switch (e) {
         case f: g = h[0 .. length - 1];
         ...
       }
     }
   }
}

This code will not compile. Worse, it *will* compile until you add the 
slice operation. Combining the two rules and taking them to their 
logical conclusion, any code using "length" is frail because there's 
always a risk that somebody might insert a slice, rendering the entire 
function uncompilable. What happened is that now "length" has become a 
backdoor-introduced keyword. Books will advise users to never use it 
even when it works, coding standards will ban it, language lawyers will 
use it to detract D, and users of other languages will smile 
condescendingly and stay with their languages.

There are a few ways out of it. "length" could be actually made a 
keyword. But even that one isn't very uniform, and steals yet another 
good identifier name.

Another way out of it is to ban "length" but stick with "$". But "$" has 
another bunch of problems. It's a special character used only once, and 
only in a very particular situation. There is no general concept 
standing behind its usage: it sticks out like a sore thumb. "$" isn't 
the last index in an array. It's that only when used inside a slice, and 
refers only to the innermost index of the array. Quite a waste of a 
special character out there, and to little usefulness.

But if we made "$" into an operator identifying the last element of 
_any_ array, which could refer to the last element of _the left-hand 
side_ array if we so want, then all of a sudden it becomes useful in a 
myriad of situations:

int i = a[$ - 1]; // get last element
int i = a[$b - 1]; // get a's element at position b.length - 1
if (a[$ - 1] == x) { ... }
if ($a > 0) { ... }
if ($a == $b) { ... }
swap(a[0], a[$ - 1]); // swap first and last element

---------------

Grammar for nullary/unary $:

---------------

I think I nailed down the way the count operator $ can work in a manner 
that's terse, expressive, and safe.

My basic goal is to enable the operator $ to be unary (applying to an 
array) to return its size, and also nullary (applying to nothing) to 
implicitly mean "fetch the size of the innermost array in the 
expression". So this code should work:

int[] foo;
foo[$ - 1]; // refers to foo's last element
foo[$foo - 1]; // same
int[][] bar;
bar[foo[$]]; // refers to bar indexed with foo's last element
bar[foo[$bar]]; // refers to bar indexed with foo's element at $bar

To insert my operator $ within D's grammar, go to the grammar page: 
http://www.digitalmars.com/d/expression.html$UnaryExpression and scroll 
down to Unary Expression. There, add the following rules:

UnaryExpression:
     PostfixExpression
     & UnaryExpression
     ... etc. etc. ...
     $ Identifier
     $ PostfixExpression . Identifier
     $ PostfixExpression ( )
     $ PostfixExpression ( ArgumentList )
     $ IndexExpression
     $ SliceExpression
     $ ArrayLiteral
     $ ( Expression )

Now a unary expression can be the $ operator followed by an identifier, 
a member access, a function call, an array access, or a slice expression 
(awesome! pick the size of the slice!), a literal array (for 
conformity), or a parenthesized expression. Perfect!

But we haven't yet filled the role of $ as a nullary operator. To do so, 
let's go in the grammar to 
http://www.digitalmars.com/d/expression.html$PrimaryExpression and 
append one more rule to it the PrimaryExpression rule:

PrimaryExpression:
     Identifier
     .Identifier
     ... etc. etc. ...
     $

Now the grammar is unambiguous and will properly distinguish unary and 
nullary uses of the $ operator.

This is more elegant than the current crap with "$" and "length" popping 
up. Besides, you can now use $ in many more places than inside []s. 
However, the grammar size does increase quite a bit, which is more fuss 
than I hoped for just one operator.

A simpler grammar would have been to simply allow:

UnaryExpression:
     PostfixExpression
     & UnaryExpression
     ... etc. etc. ...
     $ PostfixExpression

But this would have been ambiguous. If the compiler sees "$-1", then the 
bad grammar says that's a unary use of $ because -1 is a 
PostfixExpression. But that's not what we wanted! We wanted $ to be 
nullary. That's why I needed to put all the cases in UnaryExpression.

Andrei