Worst Phobos documentation evar!

Manu via Digitalmars-d digitalmars-d at puremagic.com
Wed Dec 31 20:46:32 PST 2014


On 31 December 2014 at 21:25, Walter Bright via Digitalmars-d
<digitalmars-d at puremagic.com> wrote:
> What you can contribute that would be very valuable is what we've discussed
> before - your simd expertise. Your influence is what has shaped the current
> simd support. I don't know anyone who knows even half of what you do about
> simd. What you know could make D really fly with vector math.
>
> You and I both know that auto vectorization, the approach used by everyone
> else, is not the key to high performance simd. We have an opportunity here.

Okay, well it's not really useful without a forceinline attribute.
std.simd functions need to be pseudo-intrinsics, ie, the cost of a
function call will definitely negate the work they perform.
Yes, they will (probably) be inlined in release, but debug performance
is also important, and I can't have maths code that runs much slower
in debug builds because it makes a function call passing structs by
value for every single maths opcode in the hottest loops.

There were also troubles with GDC; I haven't been able to make it emit
the opcode that I want. It reinterprets to emit something else
depending on the SSE level argument passed to the compiler. There are
attributes to set the 'target' per-function, but that didn't work for
some reason, so I need to work out if that can be resolved, otherwise
my whole approach (goal of being able to generate multiple SIMD
version code paths for runtime selection) won't work (in GCC)...

We need to get a quality low-level API out there, that is portable,
and fills gaps in the various architectures, then we can focus on
high-level wrappers and niceties.
I really want to see your half-float module merged. Where did that go?
I recall some people were saying it should be conflated with the
custom-float stuff, so half-float was just a specialisation of custom
float...
I'm not so sure about that... but maybe? I have been needing a 3.7
(10bit) float too, maybe that fits in there?

That stuff all needs forceinline too to be particularly useful.


More information about the Digitalmars-d mailing list