[Submission] D Slices
Andrei Alexandrescu
SeeWebsiteForEmail at erdani.org
Tue May 31 10:57:14 PDT 2011
On 5/31/11 11:16 AM, eles wrote:
>> I agree that 1-based indexing works pretty well with closed-right
>> intervals; forgot to mention that in my first response. Even with
> such
>> an approach, certain things are less obvious, e.g. the number of
>> elements in an interval a..b is b-a+1, not b-a.
>
> Yes, but in C too, when going from a[0] to a[N-1], people know there
> are... (N-1)-0+1 elements (so, b-a+1). It is the same.
>
> Now, why:
>
> for(iterator from a[0] to a[N-1]){ //etc. }
> //let use the above notation for for(i=0; i<=N-1; i++)
>
> is acceptable, but sudden is no more acceptable to write
>
> a[for(iterator from 0 to N-1)]
>
> and one must use
>
> a[for(iterator from 0 to N]]
>
> in order to achieve exactly the same?
>
> The last two expressions are just mental placeholders for a[0..N-1]
> and for a[0..N] respectively.
>
>
>> All in all D doesn't attempt to break new ground with open-right
>> intervals. It would be gratuitously and jarringly different from
> all of
>> its major influencers. Though I understand the choice looks odd
> coming
>> from Matlab, the same could be said about a lot of other languages.
>
> I don't see that ground. Maybe I simply lack information. Can you
> help?
As I mentioned, the issues involved are of increasing subtlety. As you
wrote, C programmers iterate upward like this:
for (int i = 0; i < N; i++)
However if N is the length of an array, that has unsigned type size_t
and therefore i should follow suit:
for (size_t i = 0; i < N; i++)
This is _not_ equivalent with:
for (size_t i = 0; i <= N - 1; i++)
because at N == 0, iteration will take a very long time (and go to all
kinds of seedy places).
But often C programmers iterate with pointers between given limits:
for (int* p = a; p < a + N; i++)
It would appear that that does take care of the N == 0 case, so the loop
should be equivalent with:
for (int* p = a; p <= a + N - 1; i++)
Alas, it's not. There is a SPECIAL RULE in the C programming language
that says pointers EXACTLY past ONE the end of an array are ALWAYS
comparable for (in)equality and ordering with pointers inside the array.
There is no such rule for pointers one BEFORE the array.
Therefore, C programmers who use pointers and lengths or pairs of
pointers ALWAYS need to use open-right intervals because otherwise they
would be unable to iterate with well-defined code. Essentially C
legalizes open-right and only open-right intervals inside the language.
C++ carried that rule through.
C++ programmers who use iterators actually write loops a bit
differently. To go from a to b they write:
for (SomeIterator it = a; it != b; ++it)
They use "!=" instead of "<" because there are iterators that compare
for inequality but not for ordering. Furthermore, they use ++it instead
of it++ because the latter may be more expensive. A conscientious C++
programmer wants to write code that makes the weakest assumptions about
the types involved.
This setup has as a direct consequence the fact that ranges expressed as
pairs of iterators MUST use the right-open convention. Otherwise it
would be impossible to express an empty range, and people would have a
very hard time iterating even a non-empty one as they'd have to write:
// Assume non-empty closed range
for (SomeIterator it = a; ; ++it)
{
... code ...
if (it == b) break;
}
or something similar.
The STL also carefully defines the behavior of iterators past one the
valid range, a move with many nice consequences including consistency
with pointers and the ease of expressing an empty range as one with a ==
b. Really, if there was any doubt that right-open is a good choice, the
STL blew it off its hinges.
That's not to say right-open ranges are always best. One example where a
closed range is necessary is with expressing total closed sets. Consider
e.g. a random function:
uint random(uint min, uint max);
In this case it's perfectly normal for random to return a number in
[min, max]. If random were defined as:
uint random(uint min, uint one_after_max);
it would be impossible (or highly unintuitive) to generate a random
number that has uint.max as its largest value.
It could be said that closed intervals make it difficult to express
empty ranges, whereas right-open intervals make it difficult to express
total ranges. It turns out the former are a common case whereas the
latter is a rarely-encountered one. That's why we use closed intervals
in std.random and case statements but open intervals everywhere else.
At this point, right-open is so embedded in the ethos of D that it would
be pretty much impossible to introduce an exception for slices without
confusing a lot of people -- all for hardly one good reason.
Andrei
More information about the Digitalmars-d
mailing list