[Issue 3738] MinstdRand0 and MinstdRand very poor performance
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Sun Jan 24 12:03:21 PST 2010
http://d.puremagic.com/issues/show_bug.cgi?id=3738
--- Comment #5 from Witold Baryluk <baryluk at smp.if.uj.edu.pl> 2010-01-24 12:03:20 PST ---
All my tests pases on your version.
10_000_000_000 (more than all possibilities) passes (for MinstdRand0 and
MinstdRand, it is possible that for other a it will fail, but i tested and done
some calculation. and it should work for all a,c<m.
I also tested versions with slightly changed constants (like 30 instad of 31,
or int.max-1, int.max-2, int.max-3, int.max+1 in multiple comibinations an
allone), and all them fail.
Shouldn't const uint y by immucatable y, just like in first static if?
I think that our pretty optimised line, can be further optimalised: (1 branch
sub)
- _x = (y >= int.max) ? ((y + 1) & int.max) : y;
+ _x = (y >= int.max) ? (y - int.max) : y;
:)
This gives:
MinstdRand0:
standard with tricks (1 branch sub): 15.25
inline with tricks (1 branch sub): 13.53 s
MinstdRand:
standard with tricks (1 branch sub): 15.26
inline with tricks (1 branch sub): 13.52 s
It doesn't give much difference (maybe 2%). But i thinkg it is better to have
one substraction, than one add and one binary and.
I found few interesting materials about this m=2^^31-1 operation.
One is http://www.firstpr.com.au/dsp/rand31/
I tested this Carta's versions: But they are only relevant for a=16807 (out
Park-Miller's StdminRand0 generator), so pretty limited:
Carta1 (original):
uint lo = 16807*(_x & 0xFFFFu);
immutable uint hi = 16807*(_x >> 16);
lo += (hi & 0x7FFF) << 16;
lo += (hi >> 15);
_x = (lo > int.max) ? (lo - int.max) : lo;
Carta3:
uint lo = 16807*(_x & 0xFFFFu);
immutable uint hi = 16807*(_x >> 16);
lo += ((hi & 0x7FFF) << 16) + (hi >> 15);
_x = (lo > int.max) ? (lo - int.max) : lo;
Carta2:
immutable uint
hi = 16807*(_x >> 16),
lo 16807*(_x & 0xFFFFu) + ((hi & 0x7FFF) << 16) + (hi >> 15);
_x = (lo > int.max) ? (lo - int.max) : lo;
Last lines can also be changed to branchles:
_x = (lo > int.max) ? (lo - int.max) : lo;
Timings:
Carta code1: 22.30 s
Carta code1 branchless: 18.42
Carta code3: 21.84 s
Carta code3 branchless: 18.19 s
Carta code2: 23.84 s
Carta code2 branchless: 19.84 s
So it is slower in all cases, and it is only for speciall cases (c=0, a <
2^^15). Of course answers are still correct.
Performing uint*uint+uint -> ulong is just faster than their tricks, and more
general, and can be reused in many other places.
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list