topN using a heap

Andrei Alexandrescu via Digitalmars-d digitalmars-d at puremagic.com
Mon Jan 18 16:11:40 PST 2016


On 1/18/16 6:51 PM, Ilya wrote:
> On Monday, 18 January 2016 at 23:49:36 UTC, Andrei Alexandrescu wrote:
>> On 1/18/16 6:44 PM, Ilya wrote:
>>> On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko wrote:
>>>> On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote:
>>>>> A RNGs don't improve worst case. It only changes an permutation for
>>>>> worst case. --Ilya
>>>>
>>>> Still, use of RNG makes it impossible to construct the worst case
>>>> beforehand, once and for all.  In that sense, this is a regression.
>>>
>>> No, it is definitely possible, because RNGs are Pseudo-RNGs. --Ilya
>>
>> unpredictableSeed uses the system clock as a source of randomness, so
>> we're good there. -- Andrei
>
> Would this work for pure functions? --Ilya

Of course not. I think this back-and-forth takes away from the gist of 
things. So let me summarize what has happened:

1. topN was reportedly slow. It was using a random pivot. I made it use 
getPivot (deterministic) instead of a random pivot in 
https://github.com/D-Programming-Language/phobos/pull/3921. getPivot is 
also what sort uses.

2. Now that both functions use getPivot, I set out to improve it in 
https://github.com/D-Programming-Language/phobos/pull/3922. The problem 
is I couldn't make getPivot impure; pure functions already call sort, so 
making it impure would have been a regression.

3. So I made getPivot use just a bit of randomness taken from the length 
of the range.

4. sort was and is attackable before all of these changes

5. So now we have pure topN and sort (subject to the range offering pure 
primitives) but they are both attackable.

6. PRNGs don't have any other downside than making functions impure.

The way I see it we have these solutions:

(a) make topN still use a random pivot. That way there's no regression. 
Then proceed and make sort() avoid quadratic cases in other ways, e.g. 
switch to heapsort if performance degrades.

(b) Improve sort() first, then apply a similar strategy to improving 
topN. Do not use RNGs at all.

(c) Seek randomness in other places, e.g. address of elements, data 
hashes etc. Come with a solution that may still be attacked in narrow 
cases but make that unlikely enough to be a real concern.


Andrei



More information about the Digitalmars-d mailing list