Sargon component library now on Dub
via Digitalmars-d-announce
digitalmars-d-announce at puremagic.com
Wed Dec 17 03:08:14 PST 2014
On Wednesday, 17 December 2014 at 09:11:22 UTC, Don wrote:
> So am I, the halffloat is much faster than any other
> implementation I've seen. The fast path for the conversion
> functions involves only a few machine instructions.
>
> I had an extra speedup for it that made it optimal, but it
> requires a language primitive to dump excess hidden precision.
> We still need this, it is a fundamental operation (C tries to
> do it implicitly using "sequence points", but they don't
> actually work properly).
The intrinsics _mm_cvtph_ps and _mm_cvtps_ph converts 4
floats/halffloats with a latency of 4 clock cycles and a
throughput of 1 per cycle on Haswell.
https://software.intel.com/sites/landingpage/IntrinsicsGuide/
More information about the Digitalmars-d-announce
mailing list