Sometimes 100 lines of code can fix your ctfe perf
Stefan Koch
uplink.coder at googlemail.com
Tue Aug 24 14:07:45 UTC 2021
Good Morning,
I just saw the need to concatenate a long-ish string at ctfe.
TLDR; writing your own specialized chunking appender is way
faster than naive CTFE
using the phobos provided appender seems to be a bad idea ;) at
least at CTFE.
As you may know when you try to so this the naive way:
```d
enum x = () { string result;
foreach(_; 0 ..n_iterations)
result ~= "whatever"
return result; } ()
```
it scales quite poorly as `n_iterations` grows.
and eventually you'll run out of memory and time.
it turns out ctfe really doesn't like appending because it
doesn't preallocate buffers or keep capacity around.
So what if we made our own string-builder that uses whole chunks
to do the appending in?
in about 70 lines of code we can build our own string build that
appends strings.
```d
module stringBuilder;
struct StringBuilder
{
enum CHUNK_SIZE = 4096;
char[CHUNK_SIZE] fixedBuffer;
size_t fixedBufferUsed;
char[CHUNK_SIZE][] chunks;
size_t[] chunk_lengths;
ushort last_chunk_used;
void appendString(const(char)[] s)
{
// can we used our preallocated fixed buffer?
if (!last_chunk_used)
{
auto fixedBufferEnd = fixedBufferUsed + s.length;
if (fixedBufferEnd < CHUNK_SIZE)
{
fixedBuffer[fixedBufferUsed .. fixedBufferEnd] =
s;
fixedBufferUsed = fixedBufferEnd;
}
}
else
{
LappendToLastChunk:
auto chunkBegin = chunk_lengths[last_chunk_used - 1];
auto chunkEnd = chunkBegin + s.length;
if (chunkEnd < CHUNK_SIZE)
{
chunks[last_chunk_used - 1][chunkBegin ..
chunkEnd] = s;
chunk_lengths[last_chunk_used - 1] = chunkEnd;
}
else if (s.length <= CHUNK_SIZE)
{
chunks.length = last_chunk_used + 1;
last_chunk_used += 1;
goto LappendToLastChunk;
}
else
{
assert(0,
"string to be appended is longer than "
~ CHUNK_SIZE.stringof ~ " this is not
supported currently");
}
}
}
string releaseBuffer()
{
size_t result_length = fixedBufferUsed;
foreach (i; 0 .. last_chunk_used - 1)
{
result_length += chunk_lengths[i];
}
char[] result;
result.length = result_length;
result[0 .. fixedBufferUsed] = fixedBuffer[0 ..
fixedBufferUsed];
size_t result_begin_pos = fixedBufferUsed;
foreach (i; 0 .. last_chunk_used - 1)
{
const chunk_used = chunk_lengths[i];
result[result_begin_pos .. result_begin_pos +
chunk_used] = chunks[i][0 .. chunk_used];
result_begin_pos += chunk_used;
}
return cast(string) result;
}
}
```
that wasn't too bad was it?
But will such a simple piece of code really be able to speed us
up?
lets do a quick perf test.
```d
import stringBuilder;
static if (is(typeof(import ("N_ITERATIONS"))))
{
enum n_iterations = mixin(import("N_ITERATIONS"));
}
else
{
enum n_iterations = 4096;
}
pragma(msg, "N: ", n_iterations);
enum big_string =
() {
version (StringBuilder)
{
StringBuilder b;
}
else version (Appender)
{
import std.array : Appender;
Appender!(string) result;
}
else
{
string result;
}
foreach(_; 0 ..n_iterations)
{
version(StringBuilder)
{
b.appendString("Let us see ... how meta can you
go?\n");
}
else version (Appender)
{
result ~= ("Let us see ... how meta can you
go?\n");
}
else
{
result ~= ("Let us see ... how meta can you
go?\n");
}
}
version (StringBuilder)
{
return b.releaseBuffer();
}
else version (Appender)
{
return result.opSlice();
}
else
return result;
} ();
pragma(msg, big_string);
```
And here are the results for various values of N_ITERATIONS:
(times are in milliseconds and are the time) `dmd -c takes`
```
| N | stringBuilder | naive | Appender
|
|-------|-----------------|---------|-----------------------------|
| 32 | 28.1 | 28 | 53
|
| 128 | 29.0 | 27.3 | 58
|
| 512 | 29.4 | 29.7 | 120.4
|
| 1024 | 30.6 | 33.7 | 304.4
|
| 2048 | 31.8 | 51.2 | 1055
|
| 4096 | 35.1 | 120.8 | 4338
|
| 8192 | 41.0 | 368.1 | 18146
|
| 16384 | 52.5 | 1353.1 | inf (it ran out of memeory)
|
```
More information about the Digitalmars-d
mailing list