RFC on range design for D2

Tue Sep 9 21:24:34 PDT 2008

Benji Smith wrote:
> bearophile wrote:
>> 5) The source code of the current algorithm module of D2 is already 
>> very complex to follow, it smells of over-generalization here and 
>> there. Sometimes it's better to reduce the generality of things, even 
>> if that reduces their power a little, to reduce complexity, etc. Tango 
>> code too isn't perfect, but it often looks more human. While you have 
>> created the algorithm module I too have created something similar, but 
>> based on different grounds.
> 
> Along these same lines, while D is still young, the documentation is 
> often thin, and code examples are scarce.
> 
> I've been doing a lot of programming lately with Tango, and despite the 
> growing documentation, I've had to refer directly to the Tango source 
> code on more occasions than I can count. Luckily, the Tango sources are 
> well-written and pretty easy to read and understand, and I've had very 
> few problems figuring out how to use the library.
> 
> I hope the range implementation makes readability a high priority.

Speaking of examples and readability, and to tie this with the 
discussion on array reallocation, I was curious on an array appender 
built on the output range interface. I've seen a quite complicated array 
builder in digitalmars.d. I wanted a simpler appender that should not do 
worse than the built-in ~= and that works with algorithm2 whenever data 
is written out.

It turned out quite simple and it improved performance of a large-scale 
data preprocessing task of mine (which involved reading and parsing 
about 1M lines of integers) by 15%. I'd be curious how it fares with 
other tests that you guys may have.

The idea is very simple: just use D's native append operation, but cache 
the capacity to avoid too many lookups (I understand that that's the 
bottleneck).

I paste the code below, I'd be indebted if you guys grabbed it and 
tested it.

Andrei

struct ArrayAppender(T)
{
     private T* _buffer;
     private size_t _length;
     private size_t _capacity;

     this(T[] array)
     {
         _buffer = array.ptr;
         _length = array.length;
         if (_buffer) _capacity = .capacity(array.ptr) / T.sizeof;
     }

     size_t capacity() const { return _capacity; }

     void putNext(T item)
     {
         invariant desiredLength = _length + 1;
         if (desiredLength <= _capacity)
         {
             // Should do in-place construction here
             _buffer[_length] = item;
         }
         else
         {
             // Time to reallocate, do it and cache capacity
             auto tmp = _buffer[0 .. _length];
             tmp ~= item;
             _buffer = tmp.ptr;
             _capacity = .capacity(_buffer) / T.sizeof;
         }
         _length = desiredLength;
     }

     T[] release()
     {
         // Shrink-to-fit
         auto result = cast(T[]) realloc(_buffer, _length * T.sizeof);
         // Relinquish state
         _buffer = null;
         _length = _capacity = 0;
         return result;
     }
}

unittest
{
     auto app = arrayAppender(new char[0]);
     string b = "abcdefg";
     foreach (char c; b) app.putNext(c);
     assert(app.release == "abcdefg");
}