topN using a heap
Ilya via Digitalmars-d
digitalmars-d at puremagic.com
Mon Jan 18 16:21:21 PST 2016
On Tuesday, 19 January 2016 at 00:11:40 UTC, Andrei Alexandrescu
wrote:
> On 1/18/16 6:51 PM, Ilya wrote:
>> On Monday, 18 January 2016 at 23:49:36 UTC, Andrei
>> Alexandrescu wrote:
>>> On 1/18/16 6:44 PM, Ilya wrote:
>>>> On Monday, 18 January 2016 at 23:27:19 UTC, Ivan Kazmenko
>>>> wrote:
>>>>> On Monday, 18 January 2016 at 23:18:03 UTC, Ilya wrote:
>>>>>> A RNGs don't improve worst case. It only changes an
>>>>>> permutation for
>>>>>> worst case. --Ilya
>>>>>
>>>>> Still, use of RNG makes it impossible to construct the
>>>>> worst case
>>>>> beforehand, once and for all. In that sense, this is a
>>>>> regression.
>>>>
>>>> No, it is definitely possible, because RNGs are Pseudo-RNGs.
>>>> --Ilya
>>>
>>> unpredictableSeed uses the system clock as a source of
>>> randomness, so
>>> we're good there. -- Andrei
>>
>> Would this work for pure functions? --Ilya
>
> Of course not. I think this back-and-forth takes away from the
> gist of things. So let me summarize what has happened:
>
> 1. topN was reportedly slow. It was using a random pivot. I
> made it use getPivot (deterministic) instead of a random pivot
> in https://github.com/D-Programming-Language/phobos/pull/3921.
> getPivot is also what sort uses.
>
> 2. Now that both functions use getPivot, I set out to improve
> it in
> https://github.com/D-Programming-Language/phobos/pull/3922. The
> problem is I couldn't make getPivot impure; pure functions
> already call sort, so making it impure would have been a
> regression.
>
> 3. So I made getPivot use just a bit of randomness taken from
> the length of the range.
>
> 4. sort was and is attackable before all of these changes
>
> 5. So now we have pure topN and sort (subject to the range
> offering pure primitives) but they are both attackable.
>
> 6. PRNGs don't have any other downside than making functions
> impure.
>
> The way I see it we have these solutions:
>
> (a) make topN still use a random pivot. That way there's no
> regression. Then proceed and make sort() avoid quadratic cases
> in other ways, e.g. switch to heapsort if performance degrades.
>
> (b) Improve sort() first, then apply a similar strategy to
> improving topN. Do not use RNGs at all.
>
> (c) Seek randomness in other places, e.g. address of elements,
> data hashes etc. Come with a solution that may still be
> attacked in narrow cases but make that unlikely enough to be a
> real concern.
>
>
> Andrei
(a) Hope no
(b) Yes
(c) Memory addresses may not work with user defined ranges and
hashes could be slow.
(d) Make PRNGs optional. --Ilya
More information about the Digitalmars-d
mailing list