potential speed improvement for repeated string concatenation

BCS ao at pathlink.com
Fri Jul 27 10:17:19 PDT 2007


Reply to Downs,

> Here's something interesting I noticed while writing serialization
> code.
> If you're in a situation where you have to repeatedly append small to
> medium sized chunks of string
> to a buffer, and the computation of those chunks is relatively cheap,
> it might be faster (and use
> less memory) to do it in two passes: first, determining the size of
> the result string, then
> allocating it completely and filling it in.
> I noticed this while trying to figure out why my serialization code
> for YAWR was so atrociously
> slow. I'm passing the ser method a void delegate(char[]) that it's
> supposed to call with the
> strings it serializes, in order. So normally, serialization would look
> like this:
> char[] buffer; ser(data, (char[] f) { buffer~=f; });
> 
> When I figured out that it was the primarily bottleneck that caused
> the delays while saving, I replaced it with the following code
> 
> size_t len=0;
> ser(data, (char[] f) { len+=f.length; });
> auto buffer=new char[len]; len=0;
> ser(data, (char[] f) { buffer[len..len+f.length]=f; len+=f.length; }
> To my surprise, this more than doubled the speed of that code. So if
> you have some block of code
> that does lots of repeated string concats, you might want to give this
> a try.
> --downs

Because you actually build the strings then it might be even faster to build 
up a char[][] of them and then string it all together later.

|char[] buffer;
|
|char[][] dmp;
|int at = 5;
|int i = 0;
|ser(data, (char[] f)
|  {
|    if(at >= dmp.length) dmp.length = at+10;
|    dmp[at] = f;
|    at++;
|    i+= str.length;
|  });
|
| // stitch it all together
|buffer.length = i;
|char[] tmp = buffer;
|foreach(char[] str; dmp)
|{
|  tmp[0..str.length] = str[];
|  tmp = [str.length..$];
|}

this only works if ser allocates new memory for each string.





More information about the Digitalmars-d mailing list