Sometimes 100 lines of code can fix your ctfe perf

Tue Aug 24 14:07:45 UTC 2021

Good Morning,

I just saw the need to concatenate a long-ish string at ctfe.

TLDR; writing your own specialized chunking appender is way 
faster than naive CTFE
using the phobos provided appender seems to be a bad idea ;) at 
least at CTFE.

As you may know when you try to so this the naive way:
```d
enum x = () { string result;
foreach(_; 0 ..n_iterations)
   result ~= "whatever"
return result; } ()
```

it scales quite poorly as `n_iterations` grows.
and eventually you'll run out of memory and time.

it turns out ctfe really doesn't like appending because it 
doesn't preallocate buffers or keep capacity around.

So what if we made our own string-builder that uses whole chunks 
to do the appending in?
in about 70 lines of code we can build our own string build that 
appends strings.

```d
module stringBuilder;
struct StringBuilder
{
     enum CHUNK_SIZE = 4096;
     char[CHUNK_SIZE] fixedBuffer;
     size_t fixedBufferUsed;

     char[CHUNK_SIZE][] chunks;
     size_t[] chunk_lengths;

     ushort last_chunk_used;

     void appendString(const(char)[] s)
     {
         // can we used our preallocated fixed buffer?
         if (!last_chunk_used)
         {
             auto fixedBufferEnd = fixedBufferUsed + s.length;
             if (fixedBufferEnd < CHUNK_SIZE)
             {
                 fixedBuffer[fixedBufferUsed .. fixedBufferEnd] = 
s;
                 fixedBufferUsed = fixedBufferEnd;
             }
         }
         else
         {
         LappendToLastChunk:
             auto chunkBegin = chunk_lengths[last_chunk_used - 1];
             auto chunkEnd = chunkBegin + s.length;
             if (chunkEnd < CHUNK_SIZE)
             {
                 chunks[last_chunk_used - 1][chunkBegin .. 
chunkEnd] = s;
                 chunk_lengths[last_chunk_used - 1] = chunkEnd;
             }
             else if (s.length <= CHUNK_SIZE)
             {
                 chunks.length = last_chunk_used + 1;
                 last_chunk_used += 1;
                 goto LappendToLastChunk;
             }
             else
             {
                 assert(0,
                         "string to be appended is longer than "
                         ~ CHUNK_SIZE.stringof ~ " this is not 
supported currently");
             }
         }
     }

     string releaseBuffer()
     {
         size_t result_length = fixedBufferUsed;
         foreach (i; 0 .. last_chunk_used - 1)
         {
             result_length += chunk_lengths[i];
         }

         char[] result;
         result.length = result_length;
         result[0 .. fixedBufferUsed] = fixedBuffer[0 .. 
fixedBufferUsed];
         size_t result_begin_pos = fixedBufferUsed;

         foreach (i; 0 .. last_chunk_used - 1)
         {
             const chunk_used = chunk_lengths[i];
             result[result_begin_pos .. result_begin_pos + 
chunk_used] = chunks[i][0 .. chunk_used];
             result_begin_pos += chunk_used;
         }
         return cast(string) result;
     }
}
```

that wasn't too bad was it?
But will such a simple piece of code really be able to speed us 
up?

lets do a quick perf test.

```d
import stringBuilder;
static if (is(typeof(import ("N_ITERATIONS"))))
{
     enum n_iterations = mixin(import("N_ITERATIONS"));
}
else
{
     enum n_iterations = 4096;
}

pragma(msg, "N: ", n_iterations);

enum big_string =
() {
     version (StringBuilder)
     {
         StringBuilder b;
     }
     else version (Appender)
     {
         import std.array : Appender;
         Appender!(string) result;
     }
     else
     {
         string result;
     }

     foreach(_; 0 ..n_iterations)
     {
         version(StringBuilder)
         {
             b.appendString("Let us see ... how meta can you 
go?\n");
         }
         else version (Appender)
         {
             result ~=     ("Let us see ... how meta can you 
go?\n");
         }
         else
         {
             result ~=     ("Let us see ... how meta can you 
go?\n");
         }
     }
     version (StringBuilder)
     {
         return b.releaseBuffer();
     }
     else version (Appender)
     {
         return result.opSlice();
     }
     else
         return result;
} ();

pragma(msg, big_string);
```

And here are the results for various values of N_ITERATIONS:
(times are in milliseconds and are the time) `dmd -c takes`
```
|   N   |  stringBuilder  | naive   | Appender                    
|
|-------|-----------------|---------|-----------------------------|
| 32    | 28.1            | 28      | 53                          
|
| 128   | 29.0            | 27.3    | 58                          
|
| 512   | 29.4            | 29.7    | 120.4                       
|
| 1024  | 30.6            | 33.7    | 304.4                       
|
| 2048  | 31.8            | 51.2    | 1055                        
|
| 4096  | 35.1            | 120.8   | 4338                        
|
| 8192  | 41.0            | 368.1   | 18146                       
|
| 16384 | 52.5            | 1353.1  | inf (it ran out of memeory) 
|
```