Vectors and matrices in D

Thu Mar 15 12:02:05 PDT 2007

Bill Baxter wrote:
> 
> Anyway, if you're coming from Java then doing a heap allocation for 
> every vector may not seem like a big deal.  But Java's allocations of 
> small objects are pretty heavily optimized from what I've heard.  D 
> isn't really there.

I think some Java compilers will actually allocate classes on the stack 
if they can prove the object does not escape scope.  This is similar to 
what we do manually in D with the 'scope' keyword.  That said, there are 
still many instances where automated stack allocation is typically not 
possible in Java (returning data from a function, for example), but can 
still be done in D using structs.

As an example of how much of an impact dynamic allocation has on 
application run times, I did a short performance analysis using Java a 
while back.  I'll include the summary plus some timings below.

----------------------------------------------------------------------

 From a performance perspective, the array-based and link-based queue 
implementations are effectively identical but for the enqueue method.  A 
link-based enqueue operation allocates a new node object then performs 
some reference manipulation to append this node to an internal 
linked-list.  By comparison, the array-based implementation will resize 
its internal array by doubling its length and copying existing elements 
into the new area if the array is full, and will then copy the new value 
into the array directly.  Therefore, assuming all operations have an 
equal cost, the performance difference between these two implementations 
rests in how often the array-queue must grow to accommodate new 
elements.  Assuming a worst-case scenario where elements are repeatedly 
inserted into the queue and are never removed, the array-based 
implementation must perform log(N) resize operations, and (N - 1) 
elements copied, assuming N represents the maximum number of elements 
inserted into the queue.  This is a cost not incurred by the link-queue, 
where no copy operation is performed regardless of queue growth. 
Therefore, in terms of N, the array-queue will perform (N + log(N)) more 
operations than the link-queue in a worst-case scenario.  So assuming 
the queue grows to 1000 elements, ten resize operations will be 
performed to produce a maximum array size of 1024 elements, and 1023 
element copy operations will be performed, for a total of 1033 
additional operations, or approximately twice as many operations as the 
linked-queue implementation.  Therefore, it is reasonable to assume that 
the array-queue will perform approximately twice as slowly as the 
link-queue, assuming this worst-case scenario.

Performance results are rarely as simple as such a code analysis 
suggests however.  Memory allocations are typically fairly expensive 
operations, as are any garbage collection runs required to clean up 
discarded memory.  Also, arrays typically have far better locality of 
reference than linked-lists (linked-list locality depends on allocator 
design and on how the container is used within the application), so 
fewer cache misses typically occur with array-based containers than with 
linked-list containers.  Finally, the array-queue described above will 
actually resize very rarely compared to the number of insertion 
operations performed, giving what is sometimes referred to as "amortized 
constant time" performance for the enqueue operation.  Given all these 
considerations, I think real-world use of these two queue 
implementations will show approximately equal performance, and the 
array-queue has the potential to be even faster than the link-queue in 
some situations, or on some systems.

For three runs inserting one million elements in succession and then 
removing them (similar to the worst case for array-queue as described 
above), my average times were:

array-queue: 153.3 milliseconds
link-queue: 641 milliseconds