seeding the pot for 2.0 features
Chad J
gamerChad at _spamIsBad_gmail.com
Sun Jan 28 01:52:34 PST 2007
Mikola Lysenko wrote:
> I'll bite. Here are the two features I consider most important:
>
> 1. Dynamic Closures:
> See:
> http://lists.puremagic.com/pipermail/digitalmars-d/2006-August/007520.html
>
> 2. Low dimensional vectors as primitive types
>
> Specifically I would like to see the types int, real, float etc.
> extended into 2-4 dimensional vectors. ie. int2, real4, float3.
>
> This would be exceptionally useful in many applications which require
> coordinate geometry. Here is a very brief list:
>
> Scientific Programs
> Physics Simulation
> Computer Graphics
> Video Games
> User Interfaces
> Computational Geometry
> Robotics
> Fluid Simulation
>
> etc.
>
> I would prefer not to recount the number of times I have written my own
> vector library, and how tedious they are to create. For most every
> language I learn it is the first thing I need to write, since so few are
> willing to provide a default implementation. In my opinion, this is
> unacceptable for a problem which occurs so frequently.
>
> One option is to extend the standard library with a vector-types class,
> but this is not nearly as nice a compiler level implementation.
>
> 1. The 90 degrees rotation trick
> This is based on the following article:
> http://www.flipcode.com/articles/article_fastervectormath.shtml
>
> The basic idea is we use templates to expand vector expressions to
> achieve better register use and memory locality. Here is a simple example:
>
> a = b + c * d;
>
> Normally, we would translate this as something like:
>
> a = b.opAdd(c.opMul(d));
>
> Which gives the following assembler:
>
> //Do opMul
> mov reg, c.x
> mul reg, d.x
> mov tmp.x, reg
> mov reg, c.y
> mul reg, d.y
> mov tmp.y, reg
> mov reg, c.z
> mul reg, d.z
> mov tmp.z, reg
>
> //Do opAdd
> mov reg, tmp.x
> add reg, b.x
> mov a.x, reg
> mov reg, tmp.y
> add reg, b.y
> mov a.y, reg
> mov reg, tmp.z
> add reg, b.z
> mov a.z, reg
>
>
> This is not particularly efficient, since it requires the creation of a
> temporary vector to store the result from c * d. A better strategy
> involves 'rotating the computation 90 degrees' and performing the
> expression on a per-component level:
>
> //x - component
> mov reg, c.x
> mul reg, d.x
> add reg, b.x
> mov a.x, reg
>
> //y - component
> mov reg, c.y
> mul reg, d.y
> add reg, b.y
> mov a.y, reg
>
> //z - component
> mov reg, c.z
> mul reg, d.z
> add reg, b.z
> mov a.z, reg
>
> The performance improvement becomes more substantial the longer the
> expression. Since overloaded operators do not instantiate templates,
> there is no obvious way to obtain this result in the current language spec.
>
> 2. Architecture specific optimizations (SIMD)
>
> For low dimensional arithmetic, many architectures provide specially
> optimized instruction for low dimensional vectors. The problem is most
> languages do not exploit them. Creating efficient SIMD code is
> impossible for a library, since each opAdd/opMul must be written using
> inline assembler and therefore incurs the overhead of a function call
> regardless. This is worsened by the fact that moving to/from a vector
> register is typically very expensive.
>
> A compiler level implementation can easily avoid these issues by
> assigning vector expressions to a register when passing them. Moreover
> it is more portable than compiler intrinsics like MSVC's SSE extensions.
> The implementation can easily emit fallback code if the architecture
> does not support SIMD instructions.
>
> 3. Swizzles
>
> A swizzle is a reordering of the elements in a vector. Shader languages
> like Cg or GLSL typically support them, given their utility in certain
> types of computations. Here are some examples:
>
> v.xxxx // Returns a vector with v.x broadcast to all components
> v.xyz // Returns only the xyz components of v
> v.zyx // Returns a vector consisting of the reverse of v's xyz
> components
>
> Enumerating all possible swizzles within a template is impossible, and
> therefore requires one function per swizzle. The result is massive code
> bloat, and many lines of automatically generated gibberish. To get an
> idea at how many functions this requires, the total number of swizzles
> for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340. Multiply that
> by the number of primitive types and the result becomes quite large.
>
>
>
> -Mik
Low dimensional primitive vectors would rock. I have to wonder though
if array operations would do the trick instead though, especially with
static arrays.
uint4 foo;
uint4 bar = someDefaultValue;
foo.x = x;
foo.y = y;
foo.z = z;
uint4 result = foo + bar;
becomes
uint[4] foo;
uint[4] bar = someDefaultValue;
foo[0] = x;
foo[1] = y;
foo[2] = z;
uint[4] result = foo + bar;
Maybe not as readable, depending on the context, but that should
suffice, right?
The only trick then is to make sure the compiler recognizes these cases
and optimizes for them. For example large dynamic arrays which are
worthy of some fixed overhead before jumping into simd instructions or
whatever, whereas the possibility of fixed overhead may change the way
these small static things get optimized. Point is, maybe this is just a
quality of implementation issue once array operations are added?
Swizzling not templatable, really? ...
import std.stdio;
import std.traits;
template doAssignment(dchar[] ordering, uint index = 0)
{
static if ( index < ordering.length )
{
// dummy will get optimized away by dmd
auto dummy = temp[index] = vector[cast(uint)ordering[index]];
mixin doAssignment!(ordering, index + 1);
}
}
void swizzle(T, dchar[] ordering)(T vector)
{
T temp;
static if ( !isStaticArray!(T) )
temp.length = vector.length;
mixin doAssignment!(ordering);
vector[0..ordering.length] = temp[0..ordering.length];
static if ( !isStaticArray!(T) )
delete temp;
}
void main()
{
int[4] point = [7,73,42,5];
writefln( point ); // prints [7,73,42,5]
swizzle!(int[],[3,1,2,0])( point );
writefln( point ); // prints [5,73,42,7]
real[] array = [81345.536,5.67,43.351,0.0,932.4,0.03,0.9852,57];
writefln( array );
swizzle!(real[],[0,1,2,4,3,5])(array);
writefln( array );
}
Now this is suboptimal, but I have made an effort to get all of the
ordering values at compile time. Now we can hand those to a template
and have it return information about optimal order of assignment and
temporary usage. Now when you need to use said information, it should
be doable either by clever writing of the assignment operations or by
pasting together stuff using mixins. As for the fact that swizzle is a
function, well it will probably get inlined, and if it doesn't, it
should (quality of implementation issue).
More information about the Digitalmars-d
mailing list