<div class="gmail_quote">On 15 March 2012 22:27, James Miller <span dir="ltr"><<a href="mailto:james@aatch.net">james@aatch.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">On 16 March 2012 08:02, Manu <<a href="mailto:turkeyman@gmail.com">turkeyman@gmail.com</a>> wrote:<br>
> On 15 March 2012 20:35, Robert Jacques <<a href="mailto:sandford@jhu.edu">sandford@jhu.edu</a>> wrote:<br>
</div><div><div class="h5">>> This sounds reasonable. However, please realize that if you wish to use<br>
>> the short vector names (i.e. float4, float3, float2, etc) you should support<br>
>> the full set with a decent range of operations and methods. Several people<br>
>> (myself included) have written similar short vector libraries; I think<br>
>> having having short vectors in phobos is important, but having one library<br>
>> provide float4 and another float2 is less than ideal, even if not all of the<br>
>> types could leverage the SMID backend. For myself, the killer feature for<br>
>> such a library would be have the CUDA compatible alignments for the types.<br>
>> (or an equivalent enum to the effect)<br>
><br>
><br>
> I can see how you come to that conclusion, but I generally feel that that's<br>
> a problem for a higher layer of library.<br>
> I really feel it's important to keep std.simd STRICTLY about the hardware<br>
> simd operations, only implementing what the hardware can express<br>
> efficiently, and not trying to emulate anything else. In some areas I feel<br>
> I've already violated that premise, by adding some functions to make good<br>
> use of something that NEON/VMX can express in a single opcode, but takes SSE<br>
> 2-3. I don't want to push that bar, otherwise the user will lose confidence<br>
> that the functions in std.simd will actually work efficiently on any given<br>
> hardware.<br>
> It's not a do-everything library, it's a hardware SIMD abstraction, and most<br>
> functions map to exactly one hardware opcode. I expect most people will want<br>
> to implement their own higher level lib on top tbh; almost nobody will ever<br>
> agree on what the perfect maths library should look like, and it's also<br>
> context specific.<br>
<br>
</div></div>I think that having the low-level vectors makes sense. Since<br>
technically only float4, int4, short8, byte16, actually make sense in<br>
the context of direct SIMD, providing other vectors would be straying<br>
into vector-library territory, as people would then expect<br>
interoperability between them, standard vector/matrix operations, and<br>
that could get too high-level. Third-party libraries have to be useful<br>
for something!<br>
<br>
Slightly off topic questions:<br>
Are you planning on providing a way to fallback if certain operations<br>
aren't supported?</blockquote><div><br></div><div>I think it depends on HOW unsupported they are. If it can be emulated efficiently (and in the context, the emulation would be as efficient as possible on the architecture anyway), then probably, but if it's a problem that should simply be solved another way, I'd rather encourage that with a compile error.</div>
<div><br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Even if it can only be picked at compile time? Is<br>
your work on Github or something?</blockquote><div><br></div><div>Yup: <a href="https://github.com/TurkeyMan/phobos/commits/master/std/simd.d">https://github.com/TurkeyMan/phobos/commits/master/std/simd.d</a></div><div> </div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I wouldn't mind having a peek, since<br>
this stuff interests me. How well does this stuff inline?</blockquote><div><br></div><div>It inlines perfectly, I pay very close attention to the codegen every single function. And have loads of static branches to select more efficient versions for more recent revisions of the SSE instruction set.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> I can<br>
imagine that a lot of the benefit of using SIMD would be lost if every<br>
SIMD instruction ends up wrapped in 3-4 more instructions, especially<br>
if you need to do consecutive operations on the same data.<br></blockquote><div><br></div><div>It will lose 100% of its benefit it it is wrapped up in even ONE function call, and equally so if the vectors don't pass/return in hardware registers as they should.</div>
<div>I'm crafting it to have the same performance characteristics as 'int'.</div></div>