Fast 2D matrix of bits

Josh Simmons simmons.44 at gmail.com
Tue Sep 20 02:05:02 PDT 2011


On Tue, Sep 20, 2011 at 6:36 PM, Josh Simmons <simmons.44 at gmail.com> wrote:
> On Tue, Sep 20, 2011 at 5:22 PM, bearophile <bearophileHUGS at lycos.com> wrote:
>>
>> My version with bsr is faster.
>>
>> Bye,
>> bearophile
>>
>
> Is that science or guessing?
>
> My horribly unscientific test shows the opposite to be true, I'm
> looking over the assembly output to see if there's an extraneous
> factor.
>

Ah, when the one I gave was slower it wasn't being unrolled by gcc,
when yours was slower my trivial loop was being vectorised. When
they're both handled the same the results are the same not favoring
either.

Cool.

I do like that the propagate-right version can be vectorised though.


More information about the Digitalmars-d mailing list