potential speed improvement for repeated string concatenation
kenny
funisher at gmail.com
Fri Jul 27 11:04:44 PDT 2007
Witold Baryluk wrote:
> downs wrote:
>
>> Here's something interesting I noticed while writing serialization code.
>> If you're in a situation where you have to repeatedly append small to
>> medium sized chunks of string to a buffer, and the computation of those
>> chunks is relatively cheap, it might be faster (and use less memory) to do
>> it in two passes: first, determining the size of the result string, then
>> allocating it completely and filling it in. I noticed this while trying to
>> figure out why my serialization code for YAWR was so atrociously slow. I'm
>> passing the ser method a void delegate(char[]) that it's supposed to call
>> with the strings it serializes, in order. So normally, serialization would
>> look like this:
>>
>> char[] buffer; ser(data, (char[] f) { buffer~=f; });
>>
>> When I figured out that it was the primarily bottleneck that caused the
>> delays while saving, I replaced it with the following code
>>
>> size_t len=0;
>> ser(data, (char[] f) { len+=f.length; });
>> auto buffer=new char[len]; len=0;
>> ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
>>
>> To my surprise, this more than doubled the speed of that code. So if you
>> have some block of code that does lots of repeated string concats, you
>> might want to give this a try.
>
> Yes, i using this pattern commonly, if this is possible.
>
> I'm going to write data structures similar to cords [1] or ropes [2] in D.
> Concatenation in them are very fast, because only pointer is copied, not
> content.
>
> Other important features:
> * easy undo (and fast, and cheap) - you can have multiple cords and will
> share most of data, even if you change something in one of them
> * easy (and fast O(1)) append and prepend
> * easy (and fast O(1)) substring slicing
> * string are computed only once.
> * reduced number of memory allocations
> * easy sharing of similar strings
>
> It will be simple to use:
>
> auto r = new cord!(char)(); // or new cord!(char)("aaa");
> r ~= "xxx";
> r ~= "yyy";
> r ~= "zzz";
>
> auto r2 = r.toString(); // will first precompute len, and then copy content
> // of buffers
>
> foreach (c; r) {
> write(c);
> }
>
> writefln("r[3]=%s", r[3]);
>
> foreach (block; r.fastiterator) {
> write(block);
> }
>
> auto r3 = r[1..3]; // O(1)
>
> r3.rebalance(); // for faster access
>
> r3[2] = 'c'; // will not change r ! O(log n) in 2-3 trees,
> // O(c*n) in lists, but with very small c (~ 0.01)
>
> auto r4 = r ~ r3; // O(1)
>
> auto r5 = r.remove(4, 9); // O(1)
>
> Common usages:
> * text editors
> * servers: appending on receiving, and appending when creating respone
> * creating big arrays step by step without knowing its size
>
> I will first implement simple list based buffers (with small overheads),
> and then implement second version based on 2-3 trees probably (for very
> large arrays, > 10MB).
>
> I really need this data structures, so it will probably take not so long to
> do all this things. :)
>
> [1] http://www.sgi.com/tech/stl/Rope.html
> [2] http://www.hpl.hp.com/personal/Hans_Boehm/gc/gc_source/cordh.txt
>
wow, that looks amazing. I personally could benefit a lot from a library such
as this! I do tons and tons of string concatenation at the moment, and hand
optimizing every single time is annoying.
What can I do to help?
kenny
More information about the Digitalmars-d
mailing list