seeding the pot for 2.0 features [small vectors]

Sun Jan 28 13:31:42 PST 2007

Bill Baxter wrote:
> Mikola Lysenko wrote:
> 
>> I'll bite.  Here are the two features I consider most important:
>>
>> 2. Low dimensional vectors as primitive types
>>
>> Specifically I would like to see the types int, real, float etc. 
>> extended into 2-4 dimensional vectors.  ie. int2, real4, float3.
>>
>> This would be exceptionally useful in many applications which require 
>> coordinate geometry.  Here is a very brief list:
>>
>> Scientific Programs
>> Physics Simulation
>> Computer Graphics
>> Video Games
>> User Interfaces
>> Computational Geometry
>> Robotics
>> Fluid Simulation
> 
> 
> It's still a tiny fraction of the number of applications that use, say, 
> strings.  So the ubiquity argument for inclusion is pretty weak, I think.
> 

What applications don't use vector instructions?

Also, I think it's more important to consider what /D applications/ will 
be using SIMD instructions, rather than what applications in general do 
or do not use coordinate geometry.  That's because a lot of those 
applications may not even be written in D or have anything to do with D, 
like the mass of stuff written in dynamic languages like perl, python, 
ruby, etc.

I have to wonder, has any language out there really given good support 
for SIMD primitives, besides assembly?  I think D could stand a lot to 
gain here.  That said, I don't mind if it's done in a library as long as 
it looks polished and is not cumbersome.

>>
>> etc.
>>
>> I would prefer not to recount the number of times I have written my 
>> own vector library, and how tedious they are to create.  For most 
>> every language I learn it is the first thing I need to write, since so 
>> few are willing to provide a default implementation.  In my opinion, 
>> this is unacceptable for a problem which occurs so frequently.
> 
> 
> Again, it occurs for *you* frequently (and I'll admit for me too), but 
> still the vast majority of programmers out there have never had a 
> burning need for a float3 with all the bells and whistles.
> 
> If the need for a vector library were truly ubiquitous, it seems like it 
> would be easier to find a decent implementation on the web, or that one 
> would at least be available in the standard library of the given 
> programming language.
> 
> As far as D is concerned, Helix has a pretty decent implementation.  See 
> http://www.dsource.org/projects/helix.  It lacks Vector2's but I've 
> added them to my own copy and I'd be happy to send it to you if you like.
> 
>> One option is to extend the standard library with a vector-types 
>> class, but this is not nearly as nice a compiler level implementation.
> 
> 
> I'm not convinced that a compiler-level implementation of these things 
> is necessary.
> 
>> 1. The 90 degrees rotation trick
>> This is based on the following article:
>> http://www.flipcode.com/articles/article_fastervectormath.shtml
>> ...
>> The performance improvement becomes more substantial the longer the 
>> expression.  Since overloaded operators do not instantiate templates, 
>> there is no obvious way to obtain this result in the current language 
>> spec.
> 
> 
> I thought the new opAssign was supposed to be enough to make expression 
> templates work in D.  Don Clugston even posted a proof-of-concept that 
> would use templates to rearrange expressions a while back.
> 
> Anyway, for this one, I think preferred approach is to make the core 
> language expressive enough so that tricks like expression templates can 
> work, rather than implementing such optimizations for particular cases 
> in the compiler.
> 
>> 2. Architecture specific optimizations (SIMD)
>>
>> For low dimensional arithmetic, many architectures provide specially 
>> optimized instruction for low dimensional vectors.  The problem is 
>> most languages do not exploit them.  Creating efficient SIMD code is 
>> impossible for a library, since each opAdd/opMul must be written using 
>> inline assembler and therefore incurs the overhead of a function call 
>> regardless.  This is worsened by the fact that moving to/from a vector 
>> register is typically very expensive.
>>
>> A compiler level implementation can easily avoid these issues by 
>> assigning vector expressions to a register when passing them.  
>> Moreover it is more portable than compiler intrinsics like MSVC's SSE 
>> extensions.  The implementation can easily emit fallback code if the 
>> architecture does not support SIMD instructions.
> 
> 
> Again, this sounds like it would be better to solve the generic issue of 
> libraries not being able to take maximum advantage of existing hardware 
> optimizations, like the issue with ASM methods not being inline-able.
> 
>> 3. Swizzles
>>
>> A swizzle is a reordering of the elements in a vector.  Shader 
>> languages like Cg or GLSL typically support them, given their utility 
>> in certain types of computations.  Here are some examples:
>>
>> v.xxxx     // Returns a vector with v.x broadcast to all components
>> v.xyz    // Returns only the xyz components of v
>> v.zyx    // Returns a vector consisting of the reverse of v's xyz 
>> components
>>
>> Enumerating all possible swizzles within a template is impossible, and 
>> therefore requires one function per swizzle.  The result is massive 
>> code bloat, and many lines of automatically generated gibberish.  To 
>> get an idea at how many functions this requires, the total number of 
>> swizzles for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340.  
>> Multiply that by the number of primitive types and the result becomes 
>> quite large.
> 
> 
> Are swizzles all that useful outside of Shader languages?  Part of the 
> reason they are useful in shaders is that GPU's can do a swizzles for 
> free.  Can CPUs (I dunno)?  Another part of the reason is that all 
> operations happen on 4-components no matter what, so if you want to 
> multiply a scalar inside a vector times another vector, you might as 
> well write it as v.xxxx * v2.  A third reason swizzles are useful on 
> GPUs is because you often end up stuffing completely unrelated junk into 
> them in the name of efficiency.  I'm not sure that's necessary or useful 
> on a CPU architecture that isn't quite as tied to float4 as GPUs are.
> -- 
> 
> I'm sure I'm among those who would use built-in small vector classes, 
> but I don't think it's clear that they should be built into the compiler 
> of a general purpose programming language.
> 
> On the other hand, if you can convince me that it really is impossible 
> to maximize performance (while maintaining convenience) any other way, 
> then I could be swayed.  Also if CPUs themselves are moving in this 
> direction, then that also is something to think about.  By that I mean 
> if float4 becomes (or already is) what could be considered a "native 
> type" on the major desktop CPUs, then I can see that it would make sense 
> for a programming language to reflect that by making it a built-in type.
> 
> --bb

I'd say float4 has been a native type for a couple years now.  A desktop 
computer that doesn't have SSE or Altivec or some other SIMD is probably 
quite antiquated and not running D programs.  This is because SSE was 
around in 1999 running on 450 MHz CPUs.
http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions

The only computers I know of that lack float4 are smartphone and PDA 
type devices running modern ARM processors.  Even some of the recent 
ARM-XSCALE processors have MMX instructions, which doesn't give float4 
but does give short4 and int2.  I'm also not sure about modern 
supercomputers and the like, since I haven't worked with those.