Change representation of dynamic arrays?

Walter Bright newshound1 at digitalmars.com
Fri Oct 19 20:03:02 PDT 2007


Currently, arrays are represented under the hood as:

	size_t lengthOfArray;
	void* ptrToStartOfArray;

Which works out reasonably well. The problem is if you want to use array 
types as the basis of iterators, and you want to step through the array. 
There's no escaping it being two operations:

	decrement the length
	increment the pointer

This puts a brick in any fast implementation of iterators. To fix that, 
we can change the representation to:

	void* ptrToStartOfArray;
	void* ptrPastEndOfArray;

Then there's just one increment. Some tests show this can improve loop 
performance by up to 70%.

So, what does this not break?

1) Doesn't break array.ptr, this will still work.
2) Doesn't break array.length as rvalue, as this is rewritten by the 
compiler as (array.end - array.start).
3) Doesn't break array.length as an lvalue, as that is handled by the 
runtime library anyway.
4) Won't break anything on D 1.0, as it wouldn't get this change.
5) Won't break array slices, or any of that stuff we love about D arrays.

What does this break?

1) Passing dynamic arrays to printf as in:

	printf("my string is %*.s\n", str);

which relied on the under-the-hood representation. This doesn't work on 
some architectures anyway, and is thoroughly obsolete. One could quickly 
fix such code by writing it as:

	printf("my string is %*.s\n", str.length, str.ptr);

2) It breaks the internal library support code, but that's my problem.

3) It breaks binary compatibility with libraries already compiled. But 
we expect to break binary compatibility with D 2.0.

4) It breaks things like cast(ulong)str, if one was crazy enough to do 
that anyway.

5) It breaks anything that tries to look at the underlying 
representation of dynamic arrays - but such code should be rewritten to 
use .ptr and .length anyway, or slice notation.

So, what do you think?



More information about the Digitalmars-d mailing list