OOP, faster data layouts, compilers

Tue Apr 26 01:01:06 PDT 2011

Sean Cavanaugh wrote:
> On 4/22/2011 2:20 PM, bearophile wrote:
>> Kai Meyer:
>>
>>> The purpose of the original post was to indicate that some low level
>>> research shows that underlying data structures (as applied to video game
>>> development) can have an impact on the performance of the application,
>>> which D (I think) cares very much about.
>>
>> The idea of the original post was a bit more complex: how can we 
>> invent new/better ways to express semantics in D code that will not 
>> forbid future D compilers to perform a bit of changes in the layout of 
>> data structures to increase code performance? Complex transforms of 
>> the data layout seem too much complex for even a good compiler, but 
>> maybe simpler ones will be possible. And I think to do this the D code 
>> needs some more semantics. I was suggesting an annotation that forbids 
>> inbound pointers, that allows the compiler to move data around a 
>> little, but this is just a start.
>>
>> Bye,
>> bearophile
> 
> 
> In many ways the biggest thing I use regularly in game development that 
> I would lose by moving to D would be good built-in SIMD support.  The PC 
> compilers from MS and Intel both have intrinsic data types and 
> instructions that cover all the operations from SSE1 up to AVX.  The 
> intrinsics are nice in that the job of register allocation and 
> scheduling is given to the compiler and generally the code it outputs is 
> good enough (though it needs to be watched at times).
> 
> Unlike ASM, intrinsics can be inlined so your math library can provide a 
> platform abstraction at that layer before building up to larger 
> operations (like vectorized forms of sin, cos, etc) and algorithms (like 
> frustum cull checks, k-dop polygon collision etc), which makes porting 
> and reusing the algorithms to other platforms much much easier, as only 
> the low level layer needs to be ported, and only outliers at the 
> algorithm level need to be tweaked after you get it up and running.
> 
> On the consoles there is AltiVec (VMX) which is very similar to SSE in 
> many ways.  The common ground is basically SSE1 tier operations : 128 
> bit values operating on 4x32 bit integer and 4x32 bit float support.  64 
> bit AMD/Intel makes SSE2 the minimum standard, and a systems language on 
> those platforms should reflect that.

Yes. It is for primarily for this reason that we made static arrays 
return-by-value. It is intended that on x86, float[4] will be an SSE1 
register.
So it should be possible to write SIMD code with standard array 
operations. (Note that this is *much* easier for the compiler, than 
trying to vectorize scalar code).

This gives syntax like:
float[4] a, b, c;
a[] += b[] * c[];
(currently works, but doesn't use SSE, so has dismal performance).

> 
> Loading and storing is comparable across platforms with similar 
> alignment restrictions or penalties for working with unaligned data. 
> Packing/swizzle/shuffle/permuting are different but this is not a huge 
> problem for most algorithms.  The lack of fused multiply and add on the 
> Intel side can be worked around or abstracted (i.e. always write code as 
> if it existed, have the Intel version expand to multiple ops).
> 
> And now my wish list:
> 
> If you have worked with shader programming through HLSL or CG the 
> expressiveness of doing the work in SIMD is very high.  If I could write 
> something that looked exactly like HLSL but it was integrated perfectly 
> in a language like D or C++, it would be pretty huge to me.  The amount 
> of math you can have in a line or two in HLSL is mind boggling at times, 
> yet extremely intuitive and rather easy to debug.