seeding the pot for 2.0 features
Mikola Lysenko
mclysenk at mtu.edu
Sat Jan 27 15:14:39 PST 2007
I'll bite. Here are the two features I consider most important:
1. Dynamic Closures:
See:
http://lists.puremagic.com/pipermail/digitalmars-d/2006-August/007520.html
2. Low dimensional vectors as primitive types
Specifically I would like to see the types int, real, float etc.
extended into 2-4 dimensional vectors. ie. int2, real4, float3.
This would be exceptionally useful in many applications which require
coordinate geometry. Here is a very brief list:
Scientific Programs
Physics Simulation
Computer Graphics
Video Games
User Interfaces
Computational Geometry
Robotics
Fluid Simulation
etc.
I would prefer not to recount the number of times I have written my own
vector library, and how tedious they are to create. For most every
language I learn it is the first thing I need to write, since so few are
willing to provide a default implementation. In my opinion, this is
unacceptable for a problem which occurs so frequently.
One option is to extend the standard library with a vector-types class,
but this is not nearly as nice a compiler level implementation.
1. The 90 degrees rotation trick
This is based on the following article:
http://www.flipcode.com/articles/article_fastervectormath.shtml
The basic idea is we use templates to expand vector expressions to
achieve better register use and memory locality. Here is a simple example:
a = b + c * d;
Normally, we would translate this as something like:
a = b.opAdd(c.opMul(d));
Which gives the following assembler:
//Do opMul
mov reg, c.x
mul reg, d.x
mov tmp.x, reg
mov reg, c.y
mul reg, d.y
mov tmp.y, reg
mov reg, c.z
mul reg, d.z
mov tmp.z, reg
//Do opAdd
mov reg, tmp.x
add reg, b.x
mov a.x, reg
mov reg, tmp.y
add reg, b.y
mov a.y, reg
mov reg, tmp.z
add reg, b.z
mov a.z, reg
This is not particularly efficient, since it requires the creation of a
temporary vector to store the result from c * d. A better strategy
involves 'rotating the computation 90 degrees' and performing the
expression on a per-component level:
//x - component
mov reg, c.x
mul reg, d.x
add reg, b.x
mov a.x, reg
//y - component
mov reg, c.y
mul reg, d.y
add reg, b.y
mov a.y, reg
//z - component
mov reg, c.z
mul reg, d.z
add reg, b.z
mov a.z, reg
The performance improvement becomes more substantial the longer the
expression. Since overloaded operators do not instantiate templates,
there is no obvious way to obtain this result in the current language spec.
2. Architecture specific optimizations (SIMD)
For low dimensional arithmetic, many architectures provide specially
optimized instruction for low dimensional vectors. The problem is most
languages do not exploit them. Creating efficient SIMD code is
impossible for a library, since each opAdd/opMul must be written using
inline assembler and therefore incurs the overhead of a function call
regardless. This is worsened by the fact that moving to/from a vector
register is typically very expensive.
A compiler level implementation can easily avoid these issues by
assigning vector expressions to a register when passing them. Moreover
it is more portable than compiler intrinsics like MSVC's SSE extensions.
The implementation can easily emit fallback code if the architecture
does not support SIMD instructions.
3. Swizzles
A swizzle is a reordering of the elements in a vector. Shader languages
like Cg or GLSL typically support them, given their utility in certain
types of computations. Here are some examples:
v.xxxx // Returns a vector with v.x broadcast to all components
v.xyz // Returns only the xyz components of v
v.zyx // Returns a vector consisting of the reverse of v's xyz components
Enumerating all possible swizzles within a template is impossible, and
therefore requires one function per swizzle. The result is massive code
bloat, and many lines of automatically generated gibberish. To get an
idea at how many functions this requires, the total number of swizzles
for 2-4 component vectors is 4^4 + 4^3 + 4^2 + 4 or 340. Multiply that
by the number of primitive types and the result becomes quite large.
-Mik
More information about the Digitalmars-d
mailing list