[Submission] D Slices

Tue May 31 10:57:14 PDT 2011

On 5/31/11 11:16 AM, eles wrote:
>> I agree that 1-based indexing works pretty well with closed-right
>> intervals; forgot to mention that in my first response. Even with
> such
>> an approach, certain things are less obvious, e.g. the number of
>> elements in an interval a..b is b-a+1, not b-a.
>
> Yes, but in C too, when going from a[0] to a[N-1], people know there
> are... (N-1)-0+1 elements (so, b-a+1). It is the same.
>
> Now, why:
>
> for(iterator from a[0] to a[N-1]){ //etc. }
> //let use the above notation for for(i=0; i<=N-1; i++)
>
> is acceptable, but sudden is no more acceptable to write
>
> a[for(iterator from 0 to N-1)]
>
> and one must use
>
> a[for(iterator from 0 to N]]
>
> in order to achieve exactly the same?
>
> The last two expressions are just mental placeholders for a[0..N-1]
> and for a[0..N] respectively.
>
>
>> All in all D doesn't attempt to break new ground with open-right
>> intervals. It would be gratuitously and jarringly different from
> all of
>> its major influencers. Though I understand the choice looks odd
> coming
>> from Matlab, the same could be said about a lot of other languages.
>
> I don't see that ground. Maybe I simply lack information. Can you
> help?

As I mentioned, the issues involved are of increasing subtlety. As you 
wrote, C programmers iterate upward like this:

for (int i = 0; i < N; i++)

However if N is the length of an array, that has unsigned type size_t 
and therefore i should follow suit:

for (size_t i = 0; i < N; i++)

This is _not_ equivalent with:

for (size_t i = 0; i <= N - 1; i++)

because at N == 0, iteration will take a very long time (and go to all 
kinds of seedy places).

But often C programmers iterate with pointers between given limits:

for (int* p = a; p < a + N; i++)

It would appear that that does take care of the N == 0 case, so the loop 
should be equivalent with:

for (int* p = a; p <= a + N - 1; i++)

Alas, it's not. There is a SPECIAL RULE in the C programming language 
that says pointers EXACTLY past ONE the end of an array are ALWAYS 
comparable for (in)equality and ordering with pointers inside the array. 
There is no such rule for pointers one BEFORE the array.

Therefore, C programmers who use pointers and lengths or pairs of 
pointers ALWAYS need to use open-right intervals because otherwise they 
would be unable to iterate with well-defined code. Essentially C 
legalizes open-right and only open-right intervals inside the language. 
C++ carried that rule through.

C++ programmers who use iterators actually write loops a bit 
differently. To go from a to b they write:

for (SomeIterator it = a; it != b; ++it)

They use "!=" instead of "<" because there are iterators that compare 
for inequality but not for ordering. Furthermore, they use ++it instead 
of it++ because the latter may be more expensive. A conscientious C++ 
programmer wants to write code that makes the weakest assumptions about 
the types involved.

This setup has as a direct consequence the fact that ranges expressed as 
pairs of iterators MUST use the right-open convention. Otherwise it 
would be impossible to express an empty range, and people would have a 
very hard time iterating even a non-empty one as they'd have to write:

// Assume non-empty closed range
for (SomeIterator it = a; ; ++it)
{
     ... code ...
     if (it == b) break;
}

or something similar.

The STL also carefully defines the behavior of iterators past one the 
valid range, a move with many nice consequences including consistency 
with pointers and the ease of expressing an empty range as one with a == 
b. Really, if there was any doubt that right-open is a good choice, the 
STL blew it off its hinges.

That's not to say right-open ranges are always best. One example where a 
closed range is necessary is with expressing total closed sets. Consider 
e.g. a random function:

uint random(uint min, uint max);

In this case it's perfectly normal for random to return a number in 
[min, max]. If random were defined as:

uint random(uint min, uint one_after_max);

it would be impossible (or highly unintuitive) to generate a random 
number that has uint.max as its largest value.

It could be said that closed intervals make it difficult to express 
empty ranges, whereas right-open intervals make it difficult to express 
total ranges. It turns out the former are a common case whereas the 
latter is a rarely-encountered one. That's why we use closed intervals 
in std.random and case statements but open intervals everywhere else.

At this point, right-open is so embedded in the ethos of D that it would 
be pretty much impossible to introduce an exception for slices without 
confusing a lot of people -- all for hardly one good reason.

Andrei