Range objects and fancy indexing -- Re: Removing elements from dynamic arrays?
Bill Baxter
wbaxter at gmail.com
Wed Nov 1 05:59:56 PST 2006
Fredrik Olsson wrote:
> Chris Nicholson-Sauls skrev:
> <snip>
>
>> # import cashew.utils.array;
>> #
>> # int[] foo = ... ;
>> # foo.remove(3U);
>> # foo.removeRange(2U, 7U);
>> # foo.removeSub(foo[1 .. 5]);
>> etc.
>>
>
> I like these.
>
>
> And I yet again see a situation where range and set types are useful. If
> a range, or a set where a basic type (just as an integer, or imaginary
> float) then removing a range, or a set of elements would be more
> intuitive. And the slice operation would not be an array-hack, but a
> more general expression reusable everywhere.
Agreed. I'd like to see 4..6 return some sort of range struct/object
too. Something as simple as
struct range {
int lo;
int hi;
};
That plus a few methods would pretty much do it. Give it an opApply and
an opApplyReverse and you've got foreach(i, 0..10) working too. (or
rather than opApply/Reverse, you might want to make another special case
for it in the compiler, like for arrays, because the general foreach
mechanism is too inefficient.)
> Imagine we have: int foo[] = [1,2,3,4,5,6];
> // Each example is now assumed unrelated to all others!
> foo.remove(1); // foo == [1, 3, 4, 5, 6]
> foo.remove(1..3); // foo = [1, 5, 6];
> foo.remove(<1, 4..5>); // foo = [1, 3, 4]
>
> or a "slice" using a set:
> auto bar = foo[<1, 4..5>]; // bar == [2, 5, 6]
> Naturally this would require a range's upper bound to be inclusive, or
> else set's would behave unintuitive. That is todays:
> foo[5..5] would be a slice with one element, not empty as is.
Not sure I agree there, or with what seems to be your assumption that
sets and ranges should be the same concept. If sets were a built-in
type I would expect them to be able to represent sets of anything, not
just sets of integers.
So, limiting the discussion to just ranges, including the upper bound vs
not including the upper bound to me is pretty much a wash. There are
cases where each one looks good or bad, and perhaps ruby's solution of
two constructs is the best bet -- 4..5 being exclusive and 4...5 being
inclusive. Use whichever works best for your case. Or not. I can
already hear the cries of "it looks too similar! there'll be bugs!".
Whatever. It seems to work for ruby.
> Today passing around a range can only be done by passing around two
> values, or a struct. The same could be said about passing around
> imaginary numbers. Why imaginary numbers are valuable enough to warrant
> build in types, while ranges are not is beyond me. I use ranges way more
> than imaginary numbers.
Good point. That's probably true for most non-sci folk and even a good
chunk of the sci-folk. Actually if the range object was extented to
include any type of bounds (e.g. floats, doubles), it could maybe be
used for interval arithmetic. Interval arithmetic is pretty useful for
continuous collision detection algorithms that are hot these days, and
many other useful geometric/numerical/mathy things. Just take the range
above and parameterize with one type:
struct range(T)
{
T lo;
T hi;
}
Maybe there'd need to be a lot more to it to be useful for interval
arithmetic, though. I haven't worked interval arithmetic much myself,
just heard talks from folks who used it.
> Same goes for sets. Sure bitshifted enumerations, and boolean logic does
> the same trick. But adding it as a build in type would allow for far
> more expressive and intuitive code (Just how would you do a foreach over
> some "ored" enumeration constants?)
By sets, once again, you apparently mean only sets of small integers.
It seems like you could define a struct that hides bitshifting from you
and can be foreach'ed without too much trouble. You just need an
appropiate opApply. (But again it won't be as efficient as a handcoded
for-loop unless it becomes a special case recognized by the compiler --
are we sensing a trend here yet?)
--
Finally on a related topic, I'd like to say that the current restriction
on opIndex and opSlice methods to only take integers seems unnecessary
and overly draconian. Is there some technical reason for it? If
opIndex/opSlice could take any kind of object then it would be possible
today to make things like a range object that can be used to slice or
index a user defined-array. This kind of thing is useful when you want
to create a Matlab or Numpy-like array package (c.f. Norbert Nemec's N-d
array proposal for D). It's very useful for instance to be able to have
a method like "find" that returns an index object that can be used to
index another array. In Numpy this sort of thing is refered to as
"fancy indexing". Numpy doesn't try to be too clever about compressing
the representation of the index set returned. It's just another array
containing the index of every element that passed the 'find' test. Not
sure why it doesn't try to be smarter. Probably because the logic to do
that compression and decompression ends up being slower than just
enumerating everything in many common cases.
Note that if 3..5 returned a range object, then there would be no real
need for the opSlice methods. They would be replaced by opIndex
overloaed for the range type. E.g opIndex(range r)
--bb
More information about the Digitalmars-d-learn
mailing list