core.simd woes

Manu turkeyman at gmail.com
Tue Oct 2 06:36:43 PDT 2012


On 2 October 2012 13:49, jerro <a at a.com> wrote:

>
> I don't think it is possible to think of all usages of this, but for every
> simd instruction there are valid usages. At least for writing pfft, I found
> shuffling two vectors very useful. For, example, I needed a function that
> takes a small, square, power of two number of elements stored in vectors
> and bit-reverses them - it rearanges them so that you can calculate the new
> index of each element by reversing bits of the old index (for 16 elements
> using 4 element vectors this can actually be done using std.simd.transpose,
> but for AVX it was more efficient to make this function work on 64
> elements). There are other places in pfft where I need to select elements
> from two vectors (for example, here https://github.com/jerro/pfft/**
> blob/sine-transform/pfft/avx_**float.d#L141<https://github.com/jerro/pfft/blob/sine-transform/pfft/avx_float.d#L141>is the platform specific code for AVX).
>
> I don't think this are the kind of things that should be implemented in
> std.simd. If you wanted to implement all such operations (for example bit
> reversing a small array) that somebody may find useful at some time,
> std.simd would need to be huge, and most of it would never be used.


I was referring purely to your 2-vector swizzle idea (or useful high-level
ideas in general). Not to hyper-context-specific functions :P


 I can imagine, I'll have a go at it... it's something I considered, but not
>> all architectures can do it efficiently.
>> That said, a most-efficient implementation would probably still be useful
>> on all architectures, but for cross platform code, I usually prefer to
>> encourage people taking another approach rather than supply a function
>> that
>> is not particularly portable (or not efficient when ported).
>>
>
> One way to do it would be to do the following for every set of selected
> indices: go through all the two element one instruction operations, and
> check if any of them does exactly what you need, and use it if it does.
> Otherwise do something that will always work although it may not always be
> optimal. One option would be to use swizzle on both vectors to get each of
> the elements to their final index and then blend the two vectors together.
> For sse 1, 2 and 3 you would need to use xorps to blend them, so I guess
> this is one more place where you would need vector literals.
>
> Someone who knows which two element shuffling operations the platform
> supports could still write optimal platform specific (but portable across
> compilers) code this way and for others this would still be useful to some
> degree (the documentation should mention that it may not be very efficient,
> though). But I think that it would be better to have platform specific APIs
> for platform specific code, as I said earlier in this thread.
>

Yeah, I have some ideas. Some permutations are obvious, the worst-case
fallback is also obvious, but there are a lot of semi-efficient in-between
cases which could take a while to identify and test. It'll be a massive
block of static-if code to be sure ;)


 Unfortunately I can't, at least not a clean one. Using string mixins would
>>> be one way but I think no one wants that kind of API in Druntime or
>>> Phobos.
>>>
>>
>>
>> Yeah, absolutely not.
>> This is possibly the most compelling motivation behind a __forceinline
>> mechanism that I've seen come up... ;)
>>
>>  I'm already unhappy that
>>
>>> std.simd produces redundant function calls.
>>>>
>>>> <rant> please  please please can haz __forceinline! </rant>
>>>>
>>>>
>>> I agree that we need that.
>>>
>>>
>> Huzzah! :)
>>
>
> Walter opposes this, right? I wonder how we could convince him.
>

I just don't think he's seen solid undeniable cases where it's necessary.


There's one more thing that I wanted to ask you. If I were to add LDC
> support to std.simd, should I just add version(LDC) blocks to all the
> functions? Sounds like a lot of duplicated code...
>

Go for it. And yeah, just add another version(). I don't think it can be
done without blatant duplication. Certainly not without __forceinline
anyway, and even then I'd be apprehensive to trust the code-gen of
intrinsics wrapped in inline wrappers.

That file will most likely become a nightmarish bloated mess... but that's
the point of libraries ;) .. It's best all that horrible munge-ing for
different architectures/compilers is put in one place and tested
thoroughly, than to not provide it and allow an infinite variety of
different implementations to appear.

What we may want to do in the future is to split the different
compilers/architectures into readable sub-modules, and public include the
appropriate one based on version logic from std.simd... but I wouldn't want
to do that until the API has stabilised.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/digitalmars-d/attachments/20121002/103b7be2/attachment.html>


More information about the Digitalmars-d mailing list