Optimization problem: bulk Boolean operations on vectors
hardreset via Digitalmars-d
digitalmars-d at puremagic.com
Fri Dec 23 17:06:54 PST 2016
On Friday, 23 December 2016 at 22:11:31 UTC, Walter Bright wrote:
> On 12/23/2016 10:03 AM, hardreset wrote:
>
> For this D code:
>
> enum SIZE = 100000000;
>
> void foo(int* a, int* b) {
> int* atop = a + 1000;
> ptrdiff_t offset = b - a;
> for (; a < atop; ++a)
> *a &= *(a + offset);
> }
>
> The following asm is generated by DMD:
>
> push EBX
> mov EBX,8[ESP]
> sub EAX,EBX
> push ESI
> cdq
> and EDX,3
> add EAX,EDX
> sar EAX,2
> lea ECX,0FA0h[EBX]
> mov ESI,EAX
> cmp EBX,ECX
> jae L2C
> L20: mov EDX,[ESI*4][EBX]
> and [EBX],EDX
> add EBX,4
> cmp EBX,ECX
> jb L20
> L2C: pop ESI
> pop EBX
> ret 4
>
> The inner loop is 5 instructions, whereas the one you wrote is
> 7 instructions (I didn't benchmark it). With some more source
> code manipulation the divide can be eliminated, but that is
> irrelevant to the inner loop.
I patched up the prolog code and timed it and it came out
identical to my asm. I tried the ptrdiff C-like code and that
still comes out 20% slower here. I'm compiling with...
rdmd test.d -O -release -inline
Am I missing something? How do I get the asm output?
More information about the Digitalmars-d
mailing list