optimized array operations
Eugene Pelekhay
pelekhay at gmail.com
Fri Sep 26 01:45:38 PDT 2008
Jb Wrote:
> If you are doing unaligned memory acesses it's actualy faster to do this..
>
> MOVLPS XMM0,[address]
> MOVHPS XMM0,[address+8]
>
> Than it is to do
>
> MOVUPS XMM0,[address]
>
> The reason being that (on almost all but a very latest chips) SSE ops are
> actualy split into 2 64 bit ops. So the former code actualy works out a lot
> faster.
>
> Also, unaligned loads are a whole lot quicker than unaligned stores. 2 or 3
> times faster IIRC. So the best method is bend over backwards to get your
> writes aligned.
>
Thanks, I'll check this way too.
Meanwile can anybody test new version on other systems, I implemented operations for unaligned case by x87 instructions and my benchamrc show that it works much slower then SSE2 version. This means that Don's theory wrong or I having unusual Pentium-M or I have bad x87 code.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arrayOfDouble.d
Type: application/octet-stream
Size: 15512 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-announce/attachments/20080926/4454844e/attachment.obj>
More information about the Digitalmars-d-announce
mailing list