seeding the pot for 2.0 features [small vectors]

Sun Jan 28 10:23:55 PST 2007

Mikola Lysenko wrote:
> I'll bite.  Here are the two features I consider most important:
> 
> 2. Low dimensional vectors as primitive types
> 
> Specifically I would like to see the types int, real, float etc. 
> extended into 2-4 dimensional vectors.  ie. int2, real4, float3.
> 
> This would be exceptionally useful in many applications which require 
> coordinate geometry.  Here is a very brief list:
> 
> Scientific Programs
> Physics Simulation
> Computer Graphics
> Video Games
> User Interfaces
> Computational Geometry
> Robotics
> Fluid Simulation

It's still a tiny fraction of the number of applications that use, say, 
strings.  So the ubiquity argument for inclusion is pretty weak, I think.

> 
> etc.
> 
> I would prefer not to recount the number of times I have written my own 
> vector library, and how tedious they are to create.  For most every 
> language I learn it is the first thing I need to write, since so few are 
> willing to provide a default implementation.  In my opinion, this is 
> unacceptable for a problem which occurs so frequently.

Again, it occurs for *you* frequently (and I'll admit for me too), but 
still the vast majority of programmers out there have never had a 
burning need for a float3 with all the bells and whistles.

If the need for a vector library were truly ubiquitous, it seems like it 
would be easier to find a decent implementation on the web, or that one 
would at least be available in the standard library of the given 
programming language.

As far as D is concerned, Helix has a pretty decent implementation.  See 
http://www.dsource.org/projects/helix.  It lacks Vector2's but I've 
added them to my own copy and I'd be happy to send it to you if you like.

> One option is to extend the standard library with a vector-types class, 
> but this is not nearly as nice a compiler level implementation.

I'm not convinced that a compiler-level implementation of these things 
is necessary.

> 1. The 90 degrees rotation trick
> This is based on the following article:
> http://www.flipcode.com/articles/article_fastervectormath.shtml
> ...
> The performance improvement becomes more substantial the longer the 
> expression.  Since overloaded operators do not instantiate templates, 
> there is no obvious way to obtain this result in the current language spec.

I thought the new opAssign was supposed to be enough to make expression 
templates work in D.  Don Clugston even posted a proof-of-concept that 
would use templates to rearrange expressions a while back.

Anyway, for this one, I think preferred approach is to make the core 
language expressive enough so that tricks like expression templates can 
work, rather than implementing such optimizations for particular cases 
in the compiler.

> 2. Architecture specific optimizations (SIMD)
> 
> For low dimensional arithmetic, many architectures provide specially 
> optimized instruction for low dimensional vectors.  The problem is most 
> languages do not exploit them.  Creating efficient SIMD code is 
> impossible for a library, since each opAdd/opMul must be written using 
> inline assembler and therefore incurs the overhead of a function call 
> regardless.  This is worsened by the fact that moving to/from a vector 
> register is typically very expensive.
> 
> A compiler level implementation can easily avoid these issues by 
> assigning vector expressions to a register when passing them.  Moreover 
> it is more portable than compiler intrinsics like MSVC's SSE extensions. 
>  The implementation can easily emit fallback code if the architecture 
> does not support SIMD instructions.

Again, this sounds like it would be better to solve the generic issue of 
libraries not being able to take maximum advantage of existing hardware 
optimizations, like the issue with ASM methods not being inline-able.

> 3. Swizzles
> 
> A swizzle is a reordering of the elements in a vector.  Shader languages 
> like Cg or GLSL typically support them, given their utility in certain 
> types of computations.  Here are some examples:
> 
> v.xxxx     // Returns a vector with v.x broadcast to all components
> v.xyz    // Returns only the xyz components of v
> v.zyx    // Returns a vector consisting of the reverse of v's xyz 
> components
> 
> Enumerating all possible swizzles within a template is impossible, and 
> therefore requires one function per swizzle.  The result is massive code 
> bloat, and many lines of automatically generated gibberish.  To get an 
> idea at how many functions this requires, the total number of swizzles 
> for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340.  Multiply that 
> by the number of primitive types and the result becomes quite large.

Are swizzles all that useful outside of Shader languages?  Part of the 
reason they are useful in shaders is that GPU's can do a swizzles for 
free.  Can CPUs (I dunno)?  Another part of the reason is that all 
operations happen on 4-components no matter what, so if you want to 
multiply a scalar inside a vector times another vector, you might as 
well write it as v.xxxx * v2.  A third reason swizzles are useful on 
GPUs is because you often end up stuffing completely unrelated junk into 
them in the name of efficiency.  I'm not sure that's necessary or useful 
on a CPU architecture that isn't quite as tied to float4 as GPUs are.
--

I'm sure I'm among those who would use built-in small vector classes, 
but I don't think it's clear that they should be built into the compiler 
of a general purpose programming language.

On the other hand, if you can convince me that it really is impossible 
to maximize performance (while maintaining convenience) any other way, 
then I could be swayed.  Also if CPUs themselves are moving in this 
direction, then that also is something to think about.  By that I mean 
if float4 becomes (or already is) what could be considered a "native 
type" on the major desktop CPUs, then I can see that it would make sense 
for a programming language to reflect that by making it a built-in type.

--bb