[phobos] byte alignment for arrays

Mon Jun 28 14:03:04 PDT 2010

Hmmm.  The natural thing would be to have some type to describe these 128-bit values (akin to __m128 in gcc, Intel and MS compilers) and use sizeof on that.  I don't see that D has any MMX/SSE intrinsics, so I don't know if there is a standard type.  If you don't have such a thing defined by the compiler, I'd be tempted to define it, based on which version of the compiler will compile this code (i.e. 32- or 64-bit dmd).  Then you can use that in your sizeof.  Maybe you'll get lucky, and that will become standard :)

Jason

----- Original Message ----
> From: Steve Schveighoffer <schveiguy at yahoo.com>
> To: Discuss the phobos library for D <phobos at puremagic.com>
> Sent: Mon, June 28, 2010 1:35:59 PM
> Subject: Re: [phobos] byte alignment for arrays
> 
> Thanks, this information helps a lot!

I will make the change to 16-byte 
> aligned.  I'm already using 8 bytes for a 4 byte length.  Using 16 
> bytes isn't much different, especially when the block size is 4096+ 
> bytes.

One final question -- I currently use sizeof(size_t) * 2, which 
> could now be sizeof(size_t) * 4, but of course, this changes to 32 bytes on 
> 64-bit dmd.  Would it make sense to just use 16 instead of some multiple of 
> size_t?

-Steve

----- Original Message ----
> From: Jason 
> Spencer <
> href="mailto:spencer8 at sbcglobal.net">spencer8 at sbcglobal.net>
> To: 
> Discuss the phobos library for D <
> href="mailto:phobos at puremagic.com">phobos at puremagic.com>
> Sent: 
> Mon, June 28, 2010 4:09:01 PM
> Subject: Re: [phobos] byte alignment for 
> arrays
> 
> Sorry, I forgot to address the every-other-one 
> concern.

The MMX registers 
> are 64-bits, so you can only do 1 
> double at a time.  Those instructions 
> only require 8-byte aligned 
> memory.  The SSE instructions use 128-bit 
> registers, so they take 
> 2 doubles at a time.  As long as the first one is 
> 16-byte aligned, 
> you can iterate through on 16-byte (128 bits) chunks, and 
> you'll be 
> good.  That's why element 0 should be 128-aligned.

If it's 
> 
> not, the processor will either have an alignment fault (in the instruction 
> 
> requires alignment) or will do a bunch of split-loads across cache 
> lines, which 
> kill performance.  

One other thought:  
> If you wanted to be 
> tricky, you could do a general, 4-byte allocation 
> and based on the address you 
> get, assign your storage pointer to the 
> next 128-aligned address.  But 
> you're offloading to run-time lot's 
> of housekeeping.  Again, maybe 
> tolerable for just these large 
> arrays.  But it starts to add a lot of 
> corner cases.  Walter 
> might have some good suggestions 
> 
> here.

Jason

----- Original Message ----
> From: 
> 
> Steve Schveighoffer <

> A question then --  let's say 
> you have 
> an array of 
> doubles, which are 8 bytes wide, and you 
> want to 
> 
> use these SSE instructions.  Even if the first 
> one is aligned on a 16-byte 
> 
> boundary, wouldn't every other 
> double be 
> 
> 
> misaligned?
_______________________________________________
phobos mailing 
> 
> list

> href="mailto:
> href="mailto:phobos at puremagic.com">phobos at puremagic.com">
> ymailto="mailto:phobos at puremagic.com" 
> href="mailto:phobos at puremagic.com">phobos at puremagic.com
http://lists.puremagic.com/mailman/listinfo/phobos

>     
_______________________________________________
phobos 
> mailing list

> href="mailto:phobos at puremagic.com">phobos at puremagic.com

> href="http://lists.puremagic.com/mailman/listinfo/phobos" target=_blank 
> >http://lists.puremagic.com/mailman/listinfo/phobos