potential speed improvement for repeated string concatenation
Tristam MacDonald
swiftcoder at gmail.com
Fri Jul 27 13:26:39 PDT 2007
How do these perform in a multi-threaded scenario? At a glance, I would guess these suffer from much the same problems as COW.
Witold Baryluk Wrote:
> downs wrote:
>
> > Here's something interesting I noticed while writing serialization code.
> > If you're in a situation where you have to repeatedly append small to
> > medium sized chunks of string to a buffer, and the computation of those
> > chunks is relatively cheap, it might be faster (and use less memory) to do
> > it in two passes: first, determining the size of the result string, then
> > allocating it completely and filling it in. I noticed this while trying to
> > figure out why my serialization code for YAWR was so atrociously slow. I'm
> > passing the ser method a void delegate(char[]) that it's supposed to call
> > with the strings it serializes, in order. So normally, serialization would
> > look like this:
> >
> > char[] buffer; ser(data, (char[] f) { buffer~=f; });
> >
> > When I figured out that it was the primarily bottleneck that caused the
> > delays while saving, I replaced it with the following code
> >
> > size_t len=0;
> > ser(data, (char[] f) { len+=f.length; });
> > auto buffer=new char[len]; len=0;
> > ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
> >
> > To my surprise, this more than doubled the speed of that code. So if you
> > have some block of code that does lots of repeated string concats, you
> > might want to give this a try.
>
> Yes, i using this pattern commonly, if this is possible.
>
> I'm going to write data structures similar to cords [1] or ropes [2] in D.
> Concatenation in them are very fast, because only pointer is copied, not
> content.
>
> Other important features:
> * easy undo (and fast, and cheap) - you can have multiple cords and will
> share most of data, even if you change something in one of them
> * easy (and fast O(1)) append and prepend
> * easy (and fast O(1)) substring slicing
> * string are computed only once.
> * reduced number of memory allocations
> * easy sharing of similar strings
>
> It will be simple to use:
>
> auto r = new cord!(char)(); // or new cord!(char)("aaa");
> r ~= "xxx";
> r ~= "yyy";
> r ~= "zzz";
>
> auto r2 = r.toString(); // will first precompute len, and then copy content
> // of buffers
>
> foreach (c; r) {
> write(c);
> }
>
> writefln("r[3]=%s", r[3]);
>
> foreach (block; r.fastiterator) {
> write(block);
> }
>
> auto r3 = r[1..3]; // O(1)
>
> r3.rebalance(); // for faster access
>
> r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees,
> // O(c*n) in lists, but with very small c (~ 0.01)
>
> auto r4 = r ~ r3; // O(1)
>
> auto r5 = r.remove(4, 9); // O(1)
>
> Common usages:
> * text editors
> * servers: appending on receiving, and appending when creating respone
> * creating big arrays step by step without knowing its size
>
> I will first implement simple list based buffers (with small overheads),
> and then implement second version based on 2-3 trees probably (for very
> large arrays, > 10MB).
>
> I really need this data structures, so it will probably take not so long to
> do all this things. :)
>
> [1] http://www.sgi.com/tech/stl/Rope.html
> [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
>
> --
> Witold Baryluk
>
More information about the Digitalmars-d
mailing list