seeding the pot for 2.0 features

Sun Jan 28 01:52:34 PST 2007

Mikola Lysenko wrote:
> I'll bite.  Here are the two features I consider most important:
> 
> 1. Dynamic Closures:
> See: 
> http://lists.puremagic.com/pipermail/digitalmars-d/2006-August/007520.html
> 
> 2. Low dimensional vectors as primitive types
> 
> Specifically I would like to see the types int, real, float etc. 
> extended into 2-4 dimensional vectors.  ie. int2, real4, float3.
> 
> This would be exceptionally useful in many applications which require 
> coordinate geometry.  Here is a very brief list:
> 
> Scientific Programs
> Physics Simulation
> Computer Graphics
> Video Games
> User Interfaces
> Computational Geometry
> Robotics
> Fluid Simulation
> 
> etc.
> 
> I would prefer not to recount the number of times I have written my own 
> vector library, and how tedious they are to create.  For most every 
> language I learn it is the first thing I need to write, since so few are 
> willing to provide a default implementation.  In my opinion, this is 
> unacceptable for a problem which occurs so frequently.
> 
> One option is to extend the standard library with a vector-types class, 
> but this is not nearly as nice a compiler level implementation.
> 
> 1. The 90 degrees rotation trick
> This is based on the following article:
> http://www.flipcode.com/articles/article_fastervectormath.shtml
> 
> The basic idea is we use templates to expand vector expressions to 
> achieve better register use and memory locality.  Here is a simple example:
> 
> a = b + c * d;
> 
> Normally, we would translate this as something like:
> 
> a = b.opAdd(c.opMul(d));
> 
> Which gives the following assembler:
> 
> //Do opMul
> mov     reg, c.x
> mul    reg, d.x
> mov     tmp.x, reg
> mov     reg, c.y
> mul    reg, d.y
> mov     tmp.y, reg
> mov     reg, c.z
> mul    reg, d.z
> mov     tmp.z, reg
> 
> //Do opAdd
> mov     reg, tmp.x
> add    reg, b.x
> mov     a.x, reg
> mov     reg, tmp.y
> add    reg, b.y
> mov     a.y, reg
> mov     reg, tmp.z
> add    reg, b.z
> mov     a.z, reg
> 
> 
> This is not particularly efficient, since it requires the creation of a 
> temporary vector to store the result from c * d.  A better strategy 
> involves 'rotating the computation 90 degrees' and performing the 
> expression on a per-component level:
> 
> //x - component
> mov    reg, c.x
> mul    reg, d.x
> add    reg, b.x
> mov    a.x, reg
> 
> //y - component
> mov    reg, c.y
> mul    reg, d.y
> add    reg, b.y
> mov    a.y, reg
> 
> //z - component
> mov    reg, c.z
> mul    reg, d.z
> add    reg, b.z
> mov    a.z, reg
> 
> The performance improvement becomes more substantial the longer the 
> expression.  Since overloaded operators do not instantiate templates, 
> there is no obvious way to obtain this result in the current language spec.
> 
> 2. Architecture specific optimizations (SIMD)
> 
> For low dimensional arithmetic, many architectures provide specially 
> optimized instruction for low dimensional vectors.  The problem is most 
> languages do not exploit them.  Creating efficient SIMD code is 
> impossible for a library, since each opAdd/opMul must be written using 
> inline assembler and therefore incurs the overhead of a function call 
> regardless.  This is worsened by the fact that moving to/from a vector 
> register is typically very expensive.
> 
> A compiler level implementation can easily avoid these issues by 
> assigning vector expressions to a register when passing them.  Moreover 
> it is more portable than compiler intrinsics like MSVC's SSE extensions. 
>  The implementation can easily emit fallback code if the architecture 
> does not support SIMD instructions.
> 
> 3. Swizzles
> 
> A swizzle is a reordering of the elements in a vector.  Shader languages 
> like Cg or GLSL typically support them, given their utility in certain 
> types of computations.  Here are some examples:
> 
> v.xxxx     // Returns a vector with v.x broadcast to all components
> v.xyz    // Returns only the xyz components of v
> v.zyx    // Returns a vector consisting of the reverse of v's xyz 
> components
> 
> Enumerating all possible swizzles within a template is impossible, and 
> therefore requires one function per swizzle.  The result is massive code 
> bloat, and many lines of automatically generated gibberish.  To get an 
> idea at how many functions this requires, the total number of swizzles 
> for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340.  Multiply that 
> by the number of primitive types and the result becomes quite large.
> 
> 
> 
> -Mik

Low dimensional primitive vectors would rock.  I have to wonder though 
if array operations would do the trick instead though, especially with 
static arrays.

uint4 foo;
uint4 bar = someDefaultValue;
foo.x = x;
foo.y = y;
foo.z = z;
uint4 result = foo + bar;

becomes

uint[4] foo;
uint[4] bar = someDefaultValue;
foo[0] = x;
foo[1] = y;
foo[2] = z;
uint[4] result = foo + bar;

Maybe not as readable, depending on the context, but that should 
suffice, right?

The only trick then is to make sure the compiler recognizes these cases 
and optimizes for them.  For example large dynamic arrays which are 
worthy of some fixed overhead before jumping into simd instructions or 
whatever, whereas the possibility of fixed overhead may change the way 
these small static things get optimized.  Point is, maybe this is just a 
quality of implementation issue once array operations are added?

Swizzling not templatable, really? ...

import std.stdio;
import std.traits;

template doAssignment(dchar[] ordering, uint index = 0)
{
	static if ( index < ordering.length )
	{
		// dummy will get optimized away by dmd
		auto dummy = temp[index] = vector[cast(uint)ordering[index]];
		mixin doAssignment!(ordering, index + 1);
	}
}

void swizzle(T, dchar[] ordering)(T vector)
{
	T temp;
	static if ( !isStaticArray!(T) )
		temp.length = vector.length;

	mixin doAssignment!(ordering);
	vector[0..ordering.length] = temp[0..ordering.length];

	static if ( !isStaticArray!(T) )
		delete temp;
}

void main()
{
	int[4] point = [7,73,42,5];
	writefln( point ); // prints [7,73,42,5]
	swizzle!(int[],[3,1,2,0])( point );
	writefln( point ); // prints [5,73,42,7]

	real[] array = [81345.536,5.67,43.351,0.0,932.4,0.03,0.9852,57];
	writefln( array );
	swizzle!(real[],[0,1,2,4,3,5])(array);
	writefln( array );
}

Now this is suboptimal, but I have made an effort to get all of the 
ordering values at compile time.  Now we can hand those to a template 
and have it return information about optimal order of assignment and 
temporary usage.  Now when you need to use said information, it should 
be doable either by clever writing of the assignment operations or by 
pasting together stuff using mixins.  As for the fact that swizzle is a 
function, well it will probably get inlined, and if it doesn't, it 
should (quality of implementation issue).