DMD 0.177 release [Length in slice expressions]

Wed Dec 20 19:52:44 PST 2006

Andrei Alexandrescu (See Website For Email) wrote:
> Don Clugston wrote:
>> Andrei Alexandrescu (See Website For Email) wrote:
>>> Similarly, let's say that a group of revolutionaries convinces Walter 
>>> (as I understand happened in case of using "length" and "$" inside 
>>> slice expressions, which is a shame and an absolute disaster that 
>>> must be undone at all costs) to implement "auto"
>>
>> This off-hand remark worries me. I presume that you mean being able to 
>> reference the length of a string, from inside the slice? (rather than 
>> simply the notation).
>> And the problem being that it requires a sliceable entity to know its 
>> length? Or is the problem more serious than that?
>> It's worrying because any change would break an enormous amount of code.
> 
> It would indeed break an enormous amount of code, but "all costs" 
> includes "enormous costs". :o) A reasonable migration path is to 
> deprecate them soon and make them illegal over the course of one year.
> 
> A small book could be written on just how bad language design is using 
> "length" and "$" to capture slice size inside a slice expression. I 
> managed to write two lengthy emails to Walter about them, and just 
> barely got started. Long story short, "length" introduces a keyword 
> through the back door, effectively making any use of "length" anywhere 
> unrecommended and highly fragile. 

That hadn't occurred to me, but you're right.  I never use length in 
that context precisely because it does look like it could be a local 
identifier, whereas I know it'll be clear it's not if I use $.  Also 
"length" is just too long to be of much use to me as a shortcut.  If I'm 
going to be that verbose I might as well type out the whole 
"varname.length".

> Using "$" is a waste of symbolic real 
> estate to serve a narrow purpose; the semantics isn't naturally 
> generalized to its logical conclusion; 

I do use this one, but I agree.  It is unnecessarily special cased for 
built-in array types.  For user-defined types, in 'myvar[0..$]' the $ 
does not expand to 'myvar.length' as one would naturally expect it to. 
Or any sort of opLength() call.  It's just a syntax error.

> and the choice of symbol itself 
> as a reminiscent of Perl's regexp is at best dubious ("#" would have 
> been vastly better as it has count connotation in natural language, and 
> making it into an operator would have fixed the generalization issue). 

I think you'll have to admit that's just your personal taste there. 
Using $ to indicate 'end' is a regexp thing, but regexp's go way beyond 
Perl.

I don't really care what it is as long as there's an terse way to 
specify 'the end' in an indexing expression.

> As things stand now, the rules governing the popping up of "length" and 
> "$" constitute a sudden boo-boo on an otherwise carefully designed 
> expression landscape.

After trying to write a multi-dimensional array class, my opinion is 
that D slice support could use some upgrades overall.  What I'd like to see:

--MultiRange Slice--
* A way to have multiple ranges in a slice, and a mix slice of and 
non-slice indices:
     A[i..j, k..m]
     A[i..j, p, k..m]

   I'm not saying built-in arrays like int[] should allow the above 
expressions, but that at least user types should be allowed to have such 
opSlice methods.  (Currently opSlice's are limited to having 2 arguments 
that represent the values that appear on either side of a single '..' 
token. You can only have two arguments max, but the arguments can be of 
any type.)

The problem is that opSlice has to look like opSlice(T1 lo, T2 hi) right 
now -- just two parameters (or zero).

One possible solution is to turn a single i..j into a single int[2] 
argument (or a mytype[2], for the general case).  But that means one 
won't be able to distinguish A[[1,3]] from A[1..3].  It also means more 
interesting extensions to slice syntax, like adding a stepsize on a 
range, will be ruled out.

Another solution is a built-in slice type.  Ranges like a..b would get 
converted to slice instances automatically.  It would basically be a 
struct with two ints in the simplest case, but to support user types as 
indexes it would need to be template-like, i.e. slice!(type).  A slice 
would look basically like
     struct slice(T=int) { T lo,hi; }
It could also have a .step property.  With the above, lo and hi would 
have to be of the same type, but really it makes sense to let them 
differ, so slice!(T1,T2).  For a range with stepsize, 
slice!(Tlo,Thi,Tstep).

To make writing opSlice methods sane, a single number like the p above 
should be converted to a slice also.  So all arguments passed to opSlice 
would be of type slice, and in the simple case of integer indices, it 
would just be:
     Type opSlice(slice s) { return x[s.lo..s.hi]; }
since integers would be the default types for slice.

--User Definable '$'--
* A way to specify 'the end' in user types.  In the general case the 
meaning of '$' in a slice cannot be known (because any type can be used 
as an index), nor can it be simply substituted with something like a 
.length property, because it may depend on context.  Consider a 
multi-dimensional array class --

      A[0..$,3..$]

The first $ means one thing, and the second one means another.

One solution - make an opLength that gets called with the parameter 
number in which the $ appears.  [My hypothesis is that the param# is the 
only context that ever matters in determining the meaning of $.]  So in 
the above int opLength(int i) would get called twice, once with i==0, 
once with i==1.   opLength can be made to return any type if the user 
just wants it to get 'passed through' to the opSlice call.  If you don't 
need the context you can define it as opLength().

--Step sizes--
This is a handy feature of Python slices.  The general syntax for a 
slice in Python is lo:hi:step, meaning go from 'lo' to 'hi', stepping by 
'step' at a time.   But any of the 3 components can be left out.
lo:hi means step=1.
lo::2 means go to the end, stepping by 2.
:hi means 0 to hi.  Negative steps are also allowed:
hi:lo:-1 means go backwards from hi to lo
::-1 go backwards from the last to first element

D syntax could be something like lo..hi:step.  I like the omission part 
of Python's syntax.  If D had that then most uses of $ would go away 
since we'd have A[3..] as an alternative to A[3..$].

--bb