__restrict, architecture intrinsics vs asm, consoles, and other

Don nospam at nospam.com
Fri Sep 23 21:41:09 PDT 2011


On 22.09.2011 20:19, Marco Leise wrote:
> Am 22.09.2011, 19:26 Uhr, schrieb Peter Alexander
> <peter.alexander.au at gmail.com>:
>
>> On 22/09/11 7:39 AM, Don wrote:
>>> On 22.09.2011 05:24, a wrote:
>>>> How would one do something like this without intrinsics (the code is
>>>> c++ using
>>>> gcc vector extensions):
>>>
>>> [snip]
>>> At present, you can't do it without ultimately resorting to inline asm.
>>> But, what we've done is to move SIMD into the machine model: the D
>>> machine model assumes that float[4] + float[4] is a more efficient
>>> operation than a loop.
>>> Currently, only arithmetic operations are implemented, and on DMD at
>>> least, they're still not proper intrinsics. So in the long term it'll be
>>> possible to do it directly, but not yet.
>>>
>>> At various times, several of us have implemented 'swizzle' using CTFE,
>>> giving you a syntax like:
>>>
>>> float[4] x, y;
>>> x[] = y[].swizzle!"cdcd"();
>>> // x[0]=y[2], x[1]=y[3], x[2]=y[2], x[3]=y[3]
>>>
>>> which compiles to a single shufps instruction.
>>
>> How can it compile into a single shufps? x and y would need to already
>> be in vector registers, and unless I've missed something, they won't
>> be. You'll need instructions for loading into registers (using the
>> slow movups because 16-byte alignment isn't guaranteed) then do the
>> shufps, then load back out again.
>>
>> This is too slow for performance critical code.
>>
>> Being stored in XMM registers from creation, passed and returned in
>> XMM registers to/from functions is a key requirement for this sort of
>> code. If you have to keep loading in and out of memory then you lose
>> all performance.
>
> I thought about this. Either write long functions, so you don't have to
> load and unload often or just make the functions assume that the
> parameters are in registers without explicit declaration.

Yeah, at the moment you have to work at a higher level, you can't just 
do a single instruction on its own.



More information about the Digitalmars-d mailing list