[Issue 4717] std.bitmanip.BitArray changes
d-bugmail at puremagic.com
d-bugmail at puremagic.com
Wed Sep 22 13:11:24 PDT 2010
http://d.puremagic.com/issues/show_bug.cgi?id=4717
--- Comment #10 from Don <clugdbug at yahoo.com.au> 2010-09-22 13:10:44 PDT ---
(In reply to comment #9)
> See also:
> http://www.strchr.com/crc32_popcnt
> http://wm.ite.pl/articles/sse-popcount.html
Yes, I saw those.
I made a simple 256-entry table lookup implementation (below, not optimised for
size) which runs at 5 cycles for 4 bytes. It'd be painful to beat that for
general-purpose 32 bit code (because AMD 32bit processors don't support SSE2).
Cache misses will kill you, though, unless the array is quite long.
I include my code here anyway, for future reference.
For 64 bits, SWAR on SSE2 is a clear winner.
----------
const(uint[256]) makepopcountlookup(){
uint [256] result;
for (int i = 0; i<= 0xFF; ++i)
{
result[i] = (i&1) + ((i&2)>>1) + ((i&4)>>2) + ((i&8)>>3) +
((i&16)>>4) + ((i&32)>>5) + ((i&64)>>6) + ((i&128)>>7);
}
return result;
}
__gshared uint[256] POPCOUNT_LOOKUP_TABLE = makepopcountlookup();
/*
A lookup table is normally a bad way to do popcount since it risks a cache
miss.
But 1K table is not so terrible, and we're dealing with a large source array.
The address of the lookup table is passed as a parameter to avoid PIC
problems.
*/
int popcountArray(uint[] src, uint *lookuptable = &POPCOUNT_LOOKUP_TABLE[0])
{
enum { LASTPARAM = 4*4 } // 3* pushes + return address.
// TIMING: Core2: 12uops, 5.0 cycles/uint
// It's entirely limited by the 5 loads.
asm {
naked;
push ESI;
push EDI;
push EBX;
mov EDI, EAX; // EDI = lookup table.
mov ECX, [ESP + LASTPARAM + 0*4]; // src.length;
mov ESI, [ESP + LASTPARAM + 1*4]; // src.ptr
xor EAX, EAX;
lea ESI, [ESI + 4*ECX]; // ESI = end of src
neg ECX; // count UP to zero.
mov EBX, [ESI + 4*ECX];
xor EDX, EDX;
add ECX, 1;
jz onlyone;
L1:
add EAX, [EDI + EDX * 4];
movzx EDX, BL;
add EAX, [EDI + EDX * 4];
movzx EDX, BH;
shr EBX, 16;
add EAX, [EDI + EDX * 4];
movzx EDX, BH;
add EAX, [EDI + EDX * 4];
movzx EDX, BL;
mov EBX, [ESI + 4*ECX];
add ECX, 1;
jnz L1;
onlyone:
add EAX, [EDI + EDX * 4];
movzx EDX, BL;
add EAX, [EDI + EDX * 4];
movzx EDX, BH;
shr EBX, 16;
add EAX, [EDI + EDX * 4];
movzx EDX, BH;
add EAX, [EDI + EDX * 4];
movzx EDX, BL;
add EAX, [EDI + EDX * 4];
pop EBX;
pop EDI;
pop ESI;
ret 2*4;
}
}
--
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
More information about the Digitalmars-d-bugs
mailing list