potential speed improvement for repeated string concatenation
Witold Baryluk
baryluk at smp.if.uj.edu.pl
Fri Jul 27 10:05:14 PDT 2007
downs wrote:
> Here's something interesting I noticed while writing serialization code.
> If you're in a situation where you have to repeatedly append small to
> medium sized chunks of string to a buffer, and the computation of those
> chunks is relatively cheap, it might be faster (and use less memory) to do
> it in two passes: first, determining the size of the result string, then
> allocating it completely and filling it in. I noticed this while trying to
> figure out why my serialization code for YAWR was so atrociously slow. I'm
> passing the ser method a void delegate(char[]) that it's supposed to call
> with the strings it serializes, in order. So normally, serialization would
> look like this:
>
> char[] buffer; ser(data, (char[] f) { buffer~=f; });
>
> When I figured out that it was the primarily bottleneck that caused the
> delays while saving, I replaced it with the following code
>
> size_t len=0;
> ser(data, (char[] f) { len+=f.length; });
> auto buffer=new char[len]; len=0;
> ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
>
> To my surprise, this more than doubled the speed of that code. So if you
> have some block of code that does lots of repeated string concats, you
> might want to give this a try.
Yes, i using this pattern commonly, if this is possible.
I'm going to write data structures similar to cords [1] or ropes [2] in D.
Concatenation in them are very fast, because only pointer is copied, not
content.
Other important features:
* easy undo (and fast, and cheap) - you can have multiple cords and will
share most of data, even if you change something in one of them
* easy (and fast O(1)) append and prepend
* easy (and fast O(1)) substring slicing
* string are computed only once.
* reduced number of memory allocations
* easy sharing of similar strings
It will be simple to use:
auto r = new cord!(char)(); // or new cord!(char)("aaa");
r ~= "xxx";
r ~= "yyy";
r ~= "zzz";
auto r2 = r.toString(); // will first precompute len, and then copy content
// of buffers
foreach (c; r) {
write(c);
}
writefln("r[3]=%s", r[3]);
foreach (block; r.fastiterator) {
write(block);
}
auto r3 = r[1..3]; // O(1)
r3.rebalance(); // for faster access
r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees,
// O(c*n) in lists, but with very small c (~ 0.01)
auto r4 = r ~ r3; // O(1)
auto r5 = r.remove(4, 9); // O(1)
Common usages:
* text editors
* servers: appending on receiving, and appending when creating respone
* creating big arrays step by step without knowing its size
I will first implement simple list based buffers (with small overheads),
and then implement second version based on 2-3 trees probably (for very
large arrays, > 10MB).
I really need this data structures, so it will probably take not so long to
do all this things. :)
[1] http://www.sgi.com/tech/stl/Rope.html
[2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
--
Witold Baryluk
More information about the Digitalmars-d
mailing list