Range objects and fancy indexing -- Re: Removing elements from dynamic arrays?

Wed Nov 1 05:59:56 PST 2006

Fredrik Olsson wrote:
> Chris Nicholson-Sauls skrev:
> <snip>
> 
>> # import cashew.utils.array;
>> #
>> # int[] foo = ... ;
>> # foo.remove(3U);
>> # foo.removeRange(2U, 7U);
>> # foo.removeSub(foo[1 .. 5]);
>> etc.
>>
> 
> I like these.
> 
> 
> And I yet again see a situation where range and set types are useful. If 
> a range, or a set where a basic type (just as an integer, or imaginary 
> float) then removing a range, or a set of elements would be more 
> intuitive. And the slice operation would not be an array-hack, but a 
> more general expression reusable everywhere.

Agreed.  I'd like to see 4..6 return some sort of range struct/object 
too.  Something as simple as
struct range {
    int lo;
    int hi;
};
That plus a few methods would pretty much do it.  Give it an opApply and 
an opApplyReverse and you've got foreach(i, 0..10) working too. (or 
rather than opApply/Reverse, you might want to make another special case 
for it in the compiler, like for arrays, because the general foreach 
mechanism is too inefficient.)

> Imagine we have: int foo[] = [1,2,3,4,5,6];
> // Each example is now assumed unrelated to all others!
> foo.remove(1); // foo == [1, 3, 4, 5, 6]
> foo.remove(1..3); // foo = [1, 5, 6];
> foo.remove(<1, 4..5>); // foo = [1, 3, 4]
> 
> or a "slice" using a set:
> auto bar = foo[<1, 4..5>]; // bar == [2, 5, 6]
> Naturally this would require a range's upper bound to be inclusive, or 
> else set's would behave unintuitive. That is todays:
> foo[5..5] would be a slice with one element, not empty as is.

Not sure I agree there, or with what seems to be your assumption that 
sets and ranges should be the same concept.  If sets were a built-in 
type I would expect them to be able to represent sets of anything, not 
just sets of integers.

So, limiting the discussion to just ranges, including the upper bound vs 
not including the upper bound to me is pretty much a wash.  There are 
cases where each one looks good or bad, and perhaps ruby's solution of 
two constructs is the best bet -- 4..5 being exclusive and 4...5 being 
inclusive.  Use whichever works best for your case.  Or not.  I can 
already hear the cries of "it looks too similar! there'll be bugs!". 
Whatever.  It seems to work for ruby.

> Today passing around a range can only be done by passing around two 
> values, or a struct. The same could be said about passing around 
> imaginary numbers. Why imaginary numbers are valuable enough to warrant 
> build in types, while ranges are not is beyond me. I use ranges way more 
> than imaginary numbers.

Good point.  That's probably true for most non-sci folk and even a good 
chunk of the sci-folk.  Actually if the range object was extented to 
include any type of bounds (e.g. floats, doubles), it could maybe be 
used for interval arithmetic.  Interval arithmetic is pretty useful for 
continuous collision detection algorithms that are hot these days, and 
many other useful geometric/numerical/mathy things.  Just take the range 
above and parameterize with one type:
struct range(T)
{
    T lo;
    T hi;
}
Maybe there'd need to be a lot more to it to be useful for interval 
arithmetic, though.  I haven't worked interval arithmetic much myself, 
just heard talks from folks who used it.

> Same goes for sets. Sure bitshifted enumerations, and boolean logic does 
> the same trick. But adding it as a build in type would allow for far 
> more expressive and intuitive code (Just how would you do a foreach over 
> some "ored" enumeration constants?)

By sets, once again, you apparently mean only sets of small integers. 
It seems like you could define a struct that hides bitshifting from you 
and can be foreach'ed without too much trouble.  You just need an 
appropiate opApply.  (But again it won't be as efficient as a handcoded 
for-loop unless it becomes a special case recognized by the compiler -- 
are we sensing a trend here yet?)

--

Finally on a related topic, I'd like to say that the current restriction 
on opIndex and opSlice methods to only take integers seems unnecessary 
and overly draconian.  Is there some technical reason for it?  If 
opIndex/opSlice could take any kind of object then it would be possible 
today to make things like a range object that can be used to slice or 
index a user defined-array.  This kind of thing is useful when you want 
to create a Matlab or Numpy-like array package (c.f. Norbert Nemec's N-d 
array proposal for D).  It's very useful for instance to be able to have 
a method like "find" that returns an index object that can be used to 
index another array.  In Numpy this sort of thing is refered to as 
"fancy indexing".  Numpy doesn't try to be too clever about compressing 
the representation of the index set returned.  It's just another array 
containing the index of every element that passed the 'find' test.  Not 
sure why it doesn't try to be smarter.  Probably because the logic to do 
that compression and decompression ends up being slower than just 
enumerating everything in many common cases.

Note that if 3..5 returned a range object, then there would be no real 
need for the opSlice methods.  They would be replaced by opIndex 
overloaed for the range type.  E.g opIndex(range r)

--bb