Go and generic programming on reddit, also touches on D

Tue Sep 20 11:01:05 PDT 2011

On 09/20/2011 08:18 AM, Robert Jacques wrote:
> On Mon, 19 Sep 2011 12:12:04 -0400, Andrei Alexandrescu
> <SeeWebsiteForEmail at erdani.org> wrote:
>
>> On 9/19/11 10:46 AM, Robert Jacques wrote:
>>> So, on balance, I'd say the two pointers representation is categorically
>>> worse than the fat pointer representation.
>>
>> Benchmark. A few of your assumptions don't hold.
>>
>> Andrei
>>
>
> So I ran Timon Gehr's benchmark on my system 10 times, and although the
> Old method performed the best, the times showed a scheduling jitter of
> about 0.2 s which is an order of magnitude larger than the differences
> between the runs. I've had some experience with removing jitter from
> benchmarks, so I redid it. For those who don't want to read through all
> the results and code snippets below, here's the executive summary:
> there's no measurable difference between the optimum popFront + empty
> implementations as laid out by Timon Gehr and myself. However, when you
> consider the current slice based implementation of popFront (i.e.
> a[1..$]), things change drastically. For an int[], ptr+ptr was 3x slower
> and for an int[3][] it was 8x slower.
>
>
> Timon Gehr's benchmark:
>
> Old New Diff
> 4.64242 4.58722 0.0551916
> 4.60659 4.5482 0.058386
> 4.53478 4.51753 0.0172519
> 4.54561 4.51867 0.026935
> 4.46662 4.4733 -0.00668139
> 4.47204 4.46893 0.00310762
> 4.52798 4.54021 -0.0122326
> 4.5685 4.57667 -0.00816801
> 4.50138 4.49186 0.00951325
> 4.49764 4.49422 0.00342912
>
> --------------------------
>
> import std.datetime, std.stdio, std.algorithm;
>
> int[1_000_000] a;
>
> void f1(){
> for(auto ptr=a.ptr, len=a.length+1; len; ++ptr, --len) {}
> }
> void f2(){
> for(auto b=a.ptr, e=a.ptr+a.length; b<e; b++){}
> }
>
> void main(){
> writeln("Always remember to run benchmarks on a single core. Pause");
> readln();
> double min_f1 = int.max;
> double min_f2 = int.max;
> foreach(i;0..10_000) {
> auto b=benchmark!(f1,f2)(3);
> min_f1 = min(min_f1, b[0].to!("seconds",double));
> min_f2 = min(min_f2, b[1].to!("seconds",double));
> }
> writeln("Old\tNew\tDiff");
> writeln(min_f1,"\t",min_f2,"\t",(min_f2-min_f1)/(min_f1)*100.0,"%");
> }
>
> which gives
>
> Old New Diff
> 0.00127244 0.00127244 0%
>
> On my system. Which is pretty much as expected. Now onto the more
> critical test.
>
> import std.datetime, std.stdio, std.algorithm, std.range, std.array;
>
> alias int[3] T;
> T[1_000_000] a;
>
> void f1(){
> auto a2 = a[];
> while(!a2.empty) { a2.popFront(); }
> }
> void f2(){
> auto ptrFront = a.ptr;
> auto ptrBack = a.ptr + a.length;
> auto i = 1;
> while(!(ptrFront >= ptrBack) ) {
> auto length = (ptrBack - ptrFront) / T.sizeof;
> auto begin = ptrFront + T.sizeof * i;
> auto end = ptrFront + T.sizeof * length;
> if(end - begin >= 0 && ptrBack - end >= 0) {
> ptrFront = begin;
> ptrBack = end;
> }
> }
> }
>
> void main(){
> writeln("Always remember to run benchmarks on a single core. Pause");
> readln;
> double min_f1 = int.max;
> double min_f2 = int.max;
> foreach(i;0..10_000) {
> auto b=benchmark!(f1,f2)(3);
> min_f1 = min(min_f1, b[0].to!("seconds",double));
> min_f2 = min(min_f2, b[1].to!("seconds",double));
> }
> writeln("Old\tNew\tDiff");
> writeln(min_f1,"\t",min_f2,"\t",(min_f2-min_f1)/(min_f1)*100.0,"%");
> }
>
> Old New Diff
> 0.00869062 0.0118318 36.145%
>
> Which makes sense given the division. Switching T to an int gives
>
> Old New Diff
> 0.00868968 0.00502586 -42.1629%
>
> Ah.. Perhaps I should also inline f1:
>
> void f1(){
> // auto a2 = a[];
> // while(!a2.empty) { a2.popFront(); }
>
> auto i = 1;
> auto length = a.length;
> auto ptr = a.ptr;
> while(!(length == 0)) {
> auto end = length;
> auto begin = i;
> if(end - begin >= 0 && length - end >= 0) {
> ptr = ptr + T.sizeof * begin;
> length = end - begin;
> }
> }
> }
>
> which results in:
>
> Old New Diff
> 0.00127244 0.00502912 295.233%
>
> And Switching T back to int[3] gives:
>
> Old New Diff
> 0.00127244 0.01192 836.78%
>
>
>

Thank you for making this more meaningful! I assumed the standard 
library benchmark function would take care of those things. Should it?