Any usable SIMD implementation?

Johan Engelen via Digitalmars-d digitalmars-d at puremagic.com
Thu Apr 7 08:10:21 PDT 2016


On Thursday, 7 April 2016 at 14:46:06 UTC, Johannes Pfau wrote:
> Am Thu, 07 Apr 2016 13:27:05 +0000
> schrieb Johan Engelen <j at j.nl>:
>
>> On Thursday, 7 April 2016 at 11:25:47 UTC, Johannes Pfau wrote:
>> > Am Thu, 07 Apr 2016 10:52:42 +0000
>> > schrieb Kai Nacke <kai at redstar.de>:
>> > 
>> >> glibc has a special mechanism for resolving the called 
>> >> function during loading. See the section on the GNU 
>> >> Indirect Function Mechanism here: 
>> >> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550/page/Optimized%20Libraries
>> >> 
>> >> Would be awesome to have something similar in 
>> >> druntime/Phobos.
>> >> 
>> >> Regards,
>> >> Kai
>> >
>> > Available in GCC as the 'ifunc' attribute: 
>> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
>> >
>> > What do you mean by 'something similar in druntime/phobos'? 
>> > A platform independent (slightly slower) variant?:
>> >
>> > http://dpaste.dzfl.pl/0aa81325a26a
>> 
>> I thought that the ifunc mechanism means an indirect call 
>> (i.e. a function ptr is set at the start of the program) ? 
>> That would be the same as what you are doing without 
>> performance difference.
>> 
>> https://gcc.gnu.org/wiki/FunctionMultiVersioning
>> "To keep the cost of dispatching low, the IFUNC mechanism is 
>> used
>> for dispatching. This makes the call to the dispatcher a 
>> one-time
>> thing during startup and a call to a function version is a 
>> single
>> jump ** indirect ** instruction." (emphasis mine)
>
> The simple variant I've posted needs an additional branch on 
> every function call. If we instead initialize the function 
> pointer in a shared static ctor there's indeed no performance 
> difference.

Yep exactly.
For @target multiversioned functions, I thought one would want to 
create one static ctor that calls cpuid once and sets all 
function ptrs of that module.

>> (does `&foo` return `impl`?)
>
> No, &foo will return the address of the wrapper function. I'm 
> not sure if we can solve this. IIRC we can't overload &.

OK. Well, the @target multifunctioning would need compiler 
support anyway and it is easy to do something slightly different 
for `&foo` when foo is a multiversioned function.

This should be fairly easy to implement in LDC, with some smarts 
needed in ordering and selecting the best function version.



More information about the Digitalmars-d mailing list