[Issue 13936] groupBy must be redone

via Digitalmars-d-bugs digitalmars-d-bugs at puremagic.com
Wed Jan 7 19:48:18 PST 2015


https://issues.dlang.org/show_bug.cgi?id=13936

--- Comment #7 from hsteoh at quickfur.ath.cx ---
I don't understand your beef against isEquivRelation. It's there precisely to
give users the choice of having maximal expressive power at a slight
performance cost, or being restricted to only equivalence relations but have
better performance.

As for only iterating things once, a thought occurred to me today that instead
of creating a new subrange every time the outer .front is called, we can simply
keep *both* inner and outer ranges in the same struct, and .front simply
returns a reference to the inner range. Multiple calls to .front returns (a
reference to) the same inner range, so popping any of them will pop all of them
at the same time. The inner range is then only valid until the next call to
popFront() -- i.e., groupBy will return a transient range. This will give you
maximum performance.

However, it forces the inner range to be non-forward -- there is no way to
.save it without making a copy of both the outer and inner ranges, and once you
.save it, it detaches from the outer range and no longer influences it when
popFront is called. There is no way to make .save work without completely
breaking the idea of single traversal, because if I use .save to make n copies
of an inner range, then which copy's popFront will influence the outer range?
And what happens if I call popFront on the outer range and then go back to a
.save'd copy of the inner range afterwards? There are endless nasty corner
cases that make this extremely messy.

So there you have it: a fast solution. But it comes at the cost of messy
semantics (transient outer range, inner ranges have reference semantics and
cannot be .save'd). The transience of the outer range is quite a major concern
IMO, because many algorithms in Phobos do not deal with them correctly. Expect
all the same problems with byLine to turn up all over again.

Honestly, I'd rather make this single-pass implementation a *different*
function (or use a different compile-time argument) that is opt-in rather than
opt-out. D's motto after all is "correct by default, fast if you ask for it".
Naïve use cases of groupBy should return the "safest" behaviour -- i.e., the
current behaviour where nothing is transient and .save has the usual semantics.
If people find that the performance is too slow, recommend groupByFast (or
equivalent), with the caveat that they will have to deal with the more
unconventional semantics in their code.

--


More information about the Digitalmars-d-bugs mailing list