Optimizing Java using D

Xinok via Digitalmars-d digitalmars-d at puremagic.com
Sun Jun 22 10:07:32 PDT 2014


On Sunday, 22 June 2014 at 15:51:13 UTC, Andrei Alexandrescu 
wrote:
> On 6/22/14, 7:47 AM, Xinok wrote:
>> On Sunday, 22 June 2014 at 08:08:44 UTC, Andrei Alexandrescu 
>> wrote:
>>> On 6/21/14, 8:28 AM, Xinok wrote:
>>>> That has since been fixed. I implemented heapsort as a 
>>>> fallback
>>>> algorithm, effectively making it an introsort.
>>>
>>> Wait, when did that happen? I think a better solution would 
>>> be to take
>>> median of more than 3 elements. -- Andrei
>>
>> https://github.com/D-Programming-Language/phobos/pull/1735
>
> Thanks. I wonder if it's possible to factor some of the heap 
> sort code with the BinaryHeap code; they look very similar.

That's a possibility, and BinaryHeap could benefit from it. My 
heapsort is over twice as fast and performs almost half the 
number of comparisons.

>> I did experiment with taking a median of five. There wasn't 
>> much benefit
>> to doing so and it added a considerable overhead. I decided it 
>> best to
>> leave quicksort as it is. If anybody else can come up with 
>> something
>> better, they're free to do a pull request.
>
> I was thinking of sorting in place a stride (every k elements 
> in the original array) first; then the median is the middle of 
> the stride. The length of the stride would be proportional to 
> log(n).
>
> Anyhow, I see you have a test with this array:
>
> foreach(k; 0..arr.length) arr[k] = k;
> swapRanges(arr[0..$/2], arr[$/2..$]);
>
> and you count the comparisons. Nice! We should add arrays with 
> other statistics, too (e.g.: presorted, random uniform, random 
> Gaussian). Also we should count the number of swaps, the other 
> primitive operation in sorting.

The only reason I added that test is to ensure it's falling back 
to heapsort properly. My intention was not to collect statistics; 
I simply couldn't think of a simpler or more direct way of 
performing this test.

>> The thing about introsort is that it guarantees non-quadratic 
>> time with
>> almost zero overhead, something taking the median of N 
>> elements cannot
>> guarantee... unless you take the median of hundreds of 
>> elements, which
>> would be incredibly complex and slow.
>
> http://dlang.org/phobos/std_algorithm.html#topN is linear.

Unfortunately not. I managed to trigger quadratic time using the 
same test case as above. If you're not aware, topN simply works 
by recursively partitioning the range until the Nth element is in 
place (in essence, it performs a partial quick sort). The author 
of this implementation made the poor decision of simply choosing 
the last element as the pivot, not even a median of three.

http://dpaste.dzfl.pl/f30fd2539f66


More information about the Digitalmars-d mailing list