optimized array operations

Eugene Pelekhay pelekhay at gmail.com
Fri Sep 26 01:45:38 PDT 2008


Jb Wrote:

> If you are doing unaligned memory acesses it's actualy faster to do this..
> 
> MOVLPS    XMM0,[address]
> MOVHPS   XMM0,[address+8]
> 
> Than it is to do
> 
> MOVUPS  XMM0,[address]
> 
> The reason being that (on almost all but a very latest chips) SSE ops are 
> actualy split into 2 64 bit ops. So the former code actualy works out a lot 
> faster.
> 
> Also, unaligned loads are a whole lot quicker than unaligned stores. 2 or 3 
> times faster IIRC. So the best method is bend over backwards to get your 
> writes aligned.
> 

Thanks, I'll check this way too.
Meanwile can anybody test new version on other systems, I implemented operations for unaligned case by x87 instructions and my benchamrc show that it works much slower then SSE2 version. This means that Don's theory wrong or I having unusual Pentium-M or I have bad x87 code.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: arrayOfDouble.d
Type: application/octet-stream
Size: 15512 bytes
Desc: not available
URL: <http://lists.puremagic.com/pipermail/digitalmars-d-announce/attachments/20080926/4454844e/attachment.obj>


More information about the Digitalmars-d-announce mailing list