Any usable SIMD implementation?
Johan Engelen via Digitalmars-d
digitalmars-d at puremagic.com
Thu Apr 7 08:10:21 PDT 2016
On Thursday, 7 April 2016 at 14:46:06 UTC, Johannes Pfau wrote:
> Am Thu, 07 Apr 2016 13:27:05 +0000
> schrieb Johan Engelen <j at j.nl>:
>
>> On Thursday, 7 April 2016 at 11:25:47 UTC, Johannes Pfau wrote:
>> > Am Thu, 07 Apr 2016 10:52:42 +0000
>> > schrieb Kai Nacke <kai at redstar.de>:
>> >
>> >> glibc has a special mechanism for resolving the called
>> >> function during loading. See the section on the GNU
>> >> Indirect Function Mechanism here:
>> >> https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/W51a7ffcf4dfd_4b40_9d82_446ebc23c550/page/Optimized%20Libraries
>> >>
>> >> Would be awesome to have something similar in
>> >> druntime/Phobos.
>> >>
>> >> Regards,
>> >> Kai
>> >
>> > Available in GCC as the 'ifunc' attribute:
>> > https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#Common-Function-Attributes
>> >
>> > What do you mean by 'something similar in druntime/phobos'?
>> > A platform independent (slightly slower) variant?:
>> >
>> > http://dpaste.dzfl.pl/0aa81325a26a
>>
>> I thought that the ifunc mechanism means an indirect call
>> (i.e. a function ptr is set at the start of the program) ?
>> That would be the same as what you are doing without
>> performance difference.
>>
>> https://gcc.gnu.org/wiki/FunctionMultiVersioning
>> "To keep the cost of dispatching low, the IFUNC mechanism is
>> used
>> for dispatching. This makes the call to the dispatcher a
>> one-time
>> thing during startup and a call to a function version is a
>> single
>> jump ** indirect ** instruction." (emphasis mine)
>
> The simple variant I've posted needs an additional branch on
> every function call. If we instead initialize the function
> pointer in a shared static ctor there's indeed no performance
> difference.
Yep exactly.
For @target multiversioned functions, I thought one would want to
create one static ctor that calls cpuid once and sets all
function ptrs of that module.
>> (does `&foo` return `impl`?)
>
> No, &foo will return the address of the wrapper function. I'm
> not sure if we can solve this. IIRC we can't overload &.
OK. Well, the @target multifunctioning would need compiler
support anyway and it is easy to do something slightly different
for `&foo` when foo is a multiversioned function.
This should be fairly easy to implement in LDC, with some smarts
needed in ordering and selecting the best function version.
More information about the Digitalmars-d
mailing list