topN using a heap

Ilya via Digitalmars-d digitalmars-d at puremagic.com
Mon Jan 18 16:21:21 PST 2016


On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu 
wrote:
> On 1/18/16 6:51 PM, Ilya wrote:
>> On Monday, 18 January 2016 at 23:49:36 UTC, Andrei 
>> Alexandrescu wrote:
>>> On 1/18/16 6:44 PM, Ilya wrote:
>>>> On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko 
>>>> wrote:
>>>>> On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote:
>>>>>> A RNGs don't improve worst case. It only changes an 
>>>>>> permutation for
>>>>>> worst case. --Ilya
>>>>>
>>>>> Still, use of RNG makes it impossible to construct the 
>>>>> worst case
>>>>> beforehand, once and for all.  In that sense, this is a 
>>>>> regression.
>>>>
>>>> No, it is definitely possible, because RNGs are Pseudo-RNGs. 
>>>> --Ilya
>>>
>>> unpredictableSeed uses the system clock as a source of 
>>> randomness, so
>>> we're good there. -- Andrei
>>
>> Would this work for pure functions? --Ilya
>
> Of course not. I think this back-and-forth takes away from the 
> gist of things. So let me summarize what has happened:
>
> 1. topN was reportedly slow. It was using a random pivot. I 
> made it use getPivot (deterministic) instead of a random pivot 
> in https://github.com/D-Programming-Language/phobos/pull/3921. 
> getPivot is also what sort uses.
>
> 2. Now that both functions use getPivot, I set out to improve 
> it in 
> https://github.com/D-Programming-Language/phobos/pull/3922. The 
> problem is I couldn't make getPivot impure; pure functions 
> already call sort, so making it impure would have been a 
> regression.
>
> 3. So I made getPivot use just a bit of randomness taken from 
> the length of the range.
>
> 4. sort was and is attackable before all of these changes
>
> 5. So now we have pure topN and sort (subject to the range 
> offering pure primitives) but they are both attackable.
>
> 6. PRNGs don't have any other downside than making functions 
> impure.
>
> The way I see it we have these solutions:
>
> (a) make topN still use a random pivot. That way there's no 
> regression. Then proceed and make sort() avoid quadratic cases 
> in other ways, e.g. switch to heapsort if performance degrades.
>
> (b) Improve sort() first, then apply a similar strategy to 
> improving topN. Do not use RNGs at all.
>
> (c) Seek randomness in other places, e.g. address of elements, 
> data hashes etc. Come with a solution that may still be 
> attacked in narrow cases but make that unlikely enough to be a 
> real concern.
>
>
> Andrei

(a) Hope no
(b) Yes
(c) Memory addresses may not work with user defined ranges and 
hashes could be slow.

(d) Make PRNGs optional. --Ilya


More information about the Digitalmars-d mailing list