Range length property
Jonathan M Davis
newsgroup.d at jmdavisprog.com
Tue Apr 10 22:38:02 UTC 2018
On Tuesday, April 10, 2018 22:07:40 Cym13 via Digitalmars-d-learn wrote:
> On Tuesday, 10 April 2018 at 20:08:14 UTC, Jonathan M Davis wrote:
> > On Tuesday, April 10, 2018 19:47:10 Nordlöw via
> >
> > Digitalmars-d-learn wrote:
> >> On Tuesday, 10 April 2018 at 14:34:40 UTC, Adam D. Ruppe wrote:
> >> > On Tuesday, 10 April 2018 at 14:25:52 UTC, Nordlöw wrote:
> >> >> Should ranges always provide a length property?
> >> >
> >> > No.
> >> >
> >> >> If so, in which cases is a length property an advantage or
> >> >> a requirement?
> >> >
> >> > Just provide it whenever it is cheap to do so. If you need
> >> > to do complex calculations or especially loop over contents
> >> > to figure out the length, do NOT provide it.
> >> >
> >> > But if it is as simple as returning some value, provide it
> >> > and algorithms can take advantage of it for optimizations
> >> > etc. as needed.
> >>
> >> I'm thinking of my own container Hashmap having its range
> >> ByKeyValue requiring one extra word of memory to store the
> >> iteration count which, in turn, can be used to calculate the
> >> length of the remaining range. Is this motivated?
> >
> > That would depend entirely on what you're trying to do, but in
> > general, if a range has length, then some algorithms will be
> > more efficient, and some algorithms do require length. So, if
> > you can provide length, then the range will be more useful,
> > just like a bidirectional range can be more useful than a
> > forward range or a random-access range can be more useful than
> > either. However, if you're not doing anything that ever
> > benefits from it having length, then it doesn't buy you
> > anything. So, it ultimately depends on what you're doing. In a
> > general purpose library, I'd say that it should have length if
> > it can do so in O(1), but if it's just for you, then it may or
> > may not be worth it.
> >
> > The other thing to consider is what happens when the container
> > is mutated. I don't think that ranges necessarily behave all
> > that well when an underlying container is mutated, but it is
> > something that has to be considered when dealing with a range
> > over a container. Even if mutating the underlying container
> > doesn't necessarily invalidate a range, maintaining the length
> > in the manner that you're suggesting probably makes it so that
> > it would be invalidated in more cases, since if any elements
> > are added or removed in the portion that was already popped off
> > the range, then the iteration count couldn't be used to
> > calculate the length in the same way anymore. Now, with a hash
> > map, the range is probably fully invalidated when anything gets
> > added or removed anyway, since that probably screws with the
> > order of the elements in the range, but how the range is going
> > to behave when the underlying container is mutated and how
> > having the length property does or doesn't affect that is
> > something that you'll need to consider.
> >
> > - Jonathan M Davis
>
> I find that discussion very interesting as I had never considered
> that because of design by introspection having a costly length
> method would lead to unexpected calls by generic algorithms
> making it a disadventage if present.
>
> On the other hand I don't think the end user should have to
> scratch his head to find the length of a range, especially if
> it's not trivial to get (say, O(log n) kind of case). Therefore
> exposing a method in any case seems the best from an API
> perspective.
>
> But to avoid the performance issues mentionned earlier it means
> it should bear a different name (get/setLength comes to mind). I
> believe this is the same kind of issue that lead to having "in"
> for associative arrays but not regular ones. However this also
> leads to less coherent APIs in contradiction with the principle
> of least surprise.
>
> In retrospect since only "unexpected" calls to such methods cause
> the issue I wonder if it wouldn't be best to have an UDA saying
> "Hey, please, this method is costly, if you're a generic template
> performing introspection you should probably not call me". And
> writing that Andrei's work on complexity annotations comes to
> mind. Anyway, I don't think the user should use different names
> just to alleviate an issue on the library side but the
> alternative would be costly to put in place...
>
> Any thoughts?
In general, if you care about efficient code, it becomes critical that
anything that's going to be used in generic code has well-known Big-O
complexity, which is why C++ cares about that sort of thing with its
containers and why Andrei specifically put the Big-O complexity of all of
the generic container operations in std.container. And that extends to
ranges. If ranges are implemented without any care about the algorithmic
complexity of the operations, then performance is ultimately going to tank.
And in the case of ranges, it's really not all that hard to understand the
algorithmic complexity, because pretty much all of the range API functions
are supposed to be O(1). Certainly, having any of them be O(n) would be a
disaster.
Ranges can expose other functions if they want to, but they're not part of
the range API and thus would not be useful in any code that wasn't written
specifically for that range type. And honestly, at some point, it makes more
sense to just allocate a dynamic array and get the full range API than to
try and find ways to have inefficient versions of any of the functions in
the range API.
In the case of length, we have walkLength. So, anything that can't have an
efficient length can have its length obtained using walkLength, and any code
that doesn't care whether getting the length is efficient can call
walkLength rather than ever dealing with length, since walkLength will use
length if it's present. So, I don't think that it makes any sense to try and
implement alternate function names for length. That problem has already been
solved.
As for in, canFind and find solve the problem.
- Jonathan M Davis
More information about the Digitalmars-d-learn
mailing list