Does dmd have SSE intrinsics?
Robert Jacques
sandford at jhu.edu
Mon Sep 21 18:58:26 PDT 2009
On Mon, 21 Sep 2009 18:32:50 -0400, Jeremie Pelletier <jeremiep at gmail.com>
wrote:
> bearophile wrote:
>> Don:
>>> (1) They don't take advantage of fixed-length arrays. In particular,
>>> operations on float[4] should be a single SSE instruction (no function
>>> call, no loop, nothing). This will make a huge difference to game and
>>> graphics programmers, I believe.
>> [...]
>>> It's issue (1) which is the killer.
>> In my answer I have forgotten to say another small thing.
>> The std.gc.malloc() of D returns pointers aligned to 16 bytes (but I
>> may like to add a second argument to such GC malloc, to specify the
>> alignment, this can be used to save some memory when the alignment
>> isn't necessary), while I think the std.c.stdlib.malloc() doesn't give
>> pointers aligned to 16 bytes.
>> In the following code if you want to implement the last line with one
>> vector instruction then a and b arrays have to be aligned to 16 bytes.
>> I think that currently LDC doesn't align a and b to 16 bytes.
>> float[4] a = [1.f, 2., 3., 4.];
>> float[4] b[] = 10f;
>> float[4] c[] = a[] + b[];
>> So you may need a syntax like the following, that's not handy:
>> align(16) float[4] a = [1.f, 2., 3., 4.];
>> align(16) float[4] b[] = 10f;
>> align(16) float[4] c[] = a[] + b[];
>> A possible solution is to automatically align to 16 (by default, but
>> it can be changed to save stack space in specific situations) all
>> static arrays allocated on the stack too :-)
>> A note: in future probably CPU vector instructions will relax their
>> alignment requirements... it's already happening.
>> Bye,
>> bearophile
>
> That 16bytes alignment is a restriction of the current usage of bit
> fields. Since every bit in the field indexes a single 16bytes block, a
> simple shift 4 bits to the right translate a pointer into its index in
> the bit field. You could align on 4 bytes boundaries but at the cost of
> doubling the size of bit fields, and possibly having slower collection
> runs.
>
> Doesn't SSE have aligned and unaligned versions of its move
> instructions? like MOVAPS and MOVUPS.
Yes, but the unaligned version is slower, even for aligned data.
Also, another issue for game/graphic/robotic programmers is the ability to
return fixed length arrays from functions. Though struct wrappers
mitigates this.
More information about the Digitalmars-d
mailing list