potential speed improvement for repeated string concatenation

Fri Jul 27 13:26:39 PDT 2007

How do these perform in a multi-threaded scenario? At a glance, I would guess these suffer from much the same problems as COW.

Witold Baryluk Wrote:
> downs wrote:
> 
> > Here's something interesting I noticed while writing serialization code.
> > If you're in a situation where you have to repeatedly append small to
> > medium sized chunks of string to a buffer, and the computation of those
> > chunks is relatively cheap, it might be faster (and use less memory) to do
> > it in two passes: first, determining the size of the result string, then
> > allocating it completely and filling it in. I noticed this while trying to
> > figure out why my serialization code for YAWR was so atrociously slow. I'm
> > passing the ser method a void delegate(char[]) that it's supposed to call
> > with the strings it serializes, in order. So normally, serialization would
> > look like this:
> > 
> > char[] buffer; ser(data, (char[] f) { buffer~=f; });
> > 
> > When I figured out that it was the primarily bottleneck that caused the
> > delays while saving, I replaced it with the following code
> > 
> > size_t len=0;
> > ser(data, (char[] f) { len+=f.length; });
> > auto buffer=new char[len]; len=0;
> > ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
> > 
> > To my surprise, this more than doubled the speed of that code. So if you
> > have some block of code that does lots of repeated string concats, you
> > might want to give this a try.
> 
> Yes, i using this pattern commonly, if this is possible.
> 
> I'm going to write data structures similar to cords [1] or ropes [2] in D.
> Concatenation in them are very fast, because only pointer is copied, not
> content.
> 
> Other important features:
>  * easy undo (and fast, and cheap) - you can have multiple cords and will
>     share most of data, even if you change something in one of them
>  * easy (and fast O(1)) append and prepend
>  * easy (and fast O(1)) substring slicing
>  * string are computed only once. 
>  * reduced number of memory allocations
>  * easy sharing of similar strings
> 
> It will be simple to use:
> 
>  auto r = new cord!(char)();      // or new cord!(char)("aaa");
>  r ~= "xxx";
>  r ~= "yyy";
>  r ~= "zzz";
> 
>  auto r2 = r.toString(); // will first precompute len, and then copy content
>                          // of buffers
> 
>  foreach (c; r) {
>    write(c);
>  }
> 
>  writefln("r[3]=%s", r[3]);
> 
>  foreach (block; r.fastiterator) {
>    write(block);
>  }
> 
>  auto r3 = r[1..3]; // O(1)
> 
>  r3.rebalance();    // for faster access
> 
>  r3[2] = 'c';       // will not change r ! O(log n) in 2-3 trees,
>                     // O(c*n) in lists, but with very small c (~ 0.01)
> 
>  auto r4 = r ~ r3;  // O(1)
> 
>  auto r5 = r.remove(4, 9); // O(1)
> 
> Common usages:
>   * text editors
>   * servers: appending on receiving, and appending when creating respone
>   * creating big arrays step by step without knowing its size
> 
> I will first implement simple list based buffers (with small overheads),
> and then implement second version based on 2-3 trees probably (for very
> large arrays, > 10MB).
> 
> I really need this data structures, so it will probably take not so long to
> do all this things. :)
> 
> [1] http://www.sgi.com/tech/stl/Rope.html
> [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
> 
> -- 
> Witold Baryluk
>