Change representation of dynamic arrays?
Sean Kelly
sean at f4.ca
Sat Oct 20 10:19:21 PDT 2007
Walter Bright wrote:
> Currently, arrays are represented under the hood as:
>
> size_t lengthOfArray;
> void* ptrToStartOfArray;
>
> Which works out reasonably well. The problem is if you want to use array
> types as the basis of iterators, and you want to step through the array.
> There's no escaping it being two operations:
>
> decrement the length
> increment the pointer
>
> This puts a brick in any fast implementation of iterators. To fix that,
> we can change the representation to:
>
> void* ptrToStartOfArray;
> void* ptrPastEndOfArray;
>
> Then there's just one increment. Some tests show this can improve loop
> performance by up to 70%.
>
> So, what does this not break?
>
> 1) Doesn't break array.ptr, this will still work.
> 2) Doesn't break array.length as rvalue, as this is rewritten by the
> compiler as (array.end - array.start).
> 3) Doesn't break array.length as an lvalue, as that is handled by the
> runtime library anyway.
> 4) Won't break anything on D 1.0, as it wouldn't get this change.
> 5) Won't break array slices, or any of that stuff we love about D arrays.
>
> What does this break?
>
> 1) Passing dynamic arrays to printf as in:
>
> printf("my string is %*.s\n", str);
>
> which relied on the under-the-hood representation. This doesn't work on
> some architectures anyway, and is thoroughly obsolete. One could quickly
> fix such code by writing it as:
>
> printf("my string is %*.s\n", str.length, str.ptr);
This isn't a compelling reason to keep the current implementation. As
Anders pointed out, it doesn't work on 64-bit anyway.
> 2) It breaks the internal library support code, but that's my problem.
Yeah, no big deal here either.
> 3) It breaks binary compatibility with libraries already compiled. But
> we expect to break binary compatibility with D 2.0.
Right.
> 4) It breaks things like cast(ulong)str, if one was crazy enough to do
> that anyway.
Though I know the runtime does this, I've never seen any other code that
does :-) And the runtime changes are easily fixed anyway. Heck, GDC
already has this implemented.
> 5) It breaks anything that tries to look at the underlying
> representation of dynamic arrays - but such code should be rewritten to
> use .ptr and .length anyway, or slice notation.
Not a compelling reason to keep the old representation, as this is
really an implementation detail.
My only potential concern would be the impact on slice operations and
such, but addition and subtraction are so fast that this change
shouldn't affect anything. Also, such operations are rarely, if ever,
performed in tight loops, unlike iterator operations.
vote++
Sean
More information about the Digitalmars-d
mailing list