Sometimes 100 lines of code can fix your ctfe perf
Stefan Koch
uplink.coder at googlemail.com
Tue Aug 24 16:06:10 UTC 2021
On Tuesday, 24 August 2021 at 14:07:45 UTC, Stefan Koch wrote:
> Good Morning,
>
> I just saw the need to concatenate a long-ish string at ctfe.
>
> TLDR; writing your own specialized chunking appender is way
> faster than naive CTFE
> using the phobos provided appender seems to be a bad idea ;) at
> least at CTFE.
I should have tested the code more thoroughly!
The reason why the string builder had so a flat performance curve
was A BUG.
It just did not concatenate anything to the buffer after the
initial was full.
This issue is now fixed and had the pleasant side effect of
removing the `goto`s from the code
the fixed version looks like this:
```d
struct StringBuilder
{
enum CHUNK_SIZE = 4096;
char[CHUNK_SIZE] fixedBuffer;
size_t fixedBufferUsed;
char[CHUNK_SIZE][] chunks;
size_t[] chunk_lengths;
ushort last_chunk_used;
void addToNewChunk(const(char)[] s)
{
if (s.length <= CHUNK_SIZE)
{
chunks.length = last_chunk_used + 1;
chunk_lengths.length = last_chunk_used + 1;
last_chunk_used += 1;
appendToLastChunk(s);
}
else
{
assert(0,
"string to be appended is longer than "
~ CHUNK_SIZE.stringof ~ " this is not
supported currently");
}
}
void appendToLastChunk(const(char)[] s)
{
auto chunkBegin = chunk_lengths[last_chunk_used - 1];
auto chunkEnd = chunkBegin + s.length;
if (chunkEnd < CHUNK_SIZE)
{
chunks[last_chunk_used - 1][chunkBegin ..
chunkEnd] = s;
chunk_lengths[last_chunk_used - 1] = chunkEnd;
}
else
addToNewChunk(s);
}
void appendString(const(char)[] s)
{
// can we used our preallocated fixed buffer?
if (!last_chunk_used)
{
auto fixedBufferEnd = fixedBufferUsed + s.length;
if (fixedBufferEnd < CHUNK_SIZE)
{
fixedBuffer[fixedBufferUsed .. fixedBufferEnd] =
s;
fixedBufferUsed = fixedBufferEnd;
}
else
{
addToNewChunk(s);
}
}
else
appendToLastChunk(s);
}
string releaseBuffer()
{
size_t result_length = fixedBufferUsed;
foreach (i; 0 .. last_chunk_used)
{
result_length += chunk_lengths[i];
}
char[] result;
result.length = result_length;
result[0 .. fixedBufferUsed] = fixedBuffer[0 ..
fixedBufferUsed];
size_t result_begin_pos = fixedBufferUsed;
foreach (i; 0 .. last_chunk_used - 1)
{
const chunk_used = chunk_lengths[i];
result[result_begin_pos .. result_begin_pos +
chunk_used] = chunks[i][0 .. chunk_used];
result_begin_pos += chunk_used;
}
return cast(string) result;
}
}
```
And now the resulting table is somewhat different as well:
```
| N | stringBuilder | naive | Appender
|
|-------|-----------------|---------|-----------------------------|
| 32 | 27.8 | 27.8 | 53
|
| 128 | 29.8 | 27.2 | 54.1
|
| 512 | 33.7 | 29.8 | 121
|
| 1024 | 39.8 | 34.4 | 313.9
|
| 2048 | 52.0 | 52.9 | 1047
|
| 4096 | 77.0 | 123.9 | 4350
|
| 8192 | 127.3 | 378.2 | 18146
|
| 10000 | 151.1 | 548.6 | 26796
|
| 16384 | 233.7 | 1386 | inf (it ran out of memory)
|
```
I inserted another row which is the N the appender can barely do
before I run out of memory.
or memeory which is special memory to be used in memes.
We can see now the string build behaves linear rather than almost
constant which is much more what I would have expected.
Now that were results for CTFE;
Let's now look at how it looks at regular runtime.
We compile with `ldc -O3` we also only care for the table
starting at 10000
we also add N == 0 to account for compile time as we are
measuring compiletime and runtime together
I will write out only the difference from N0
```
| N | stringBuilder | naive | Appender |
|-------|-----------------|---------|----------|
| 0 |-147.0 |-100.0 |-228.7 |
| 16384 | 12.5 | 6.2 | 3.7 |
| 32768 | 23.5 | 7.9 | 5.0 |
| 65536 | 19.6 | 7.6 | 5.6 |
| 131072| 37.0 | 15.2 | 14.7 |
| 262144| 44.4 | 24.3 | 16.3 |
| 524288| 108.8 | 30.6 | 21.1 |
|1048576| 236.5 | 55.4 | 43.7 |
|2097152| 369.4 | 101.1 | 72.3 |
|4194304| 823.2 | 198.0 | 150.1 |
```
We can see that the runtime performance of the string builder is
quite bad as we scale.
and that the naive version is quite good even under extreme
stress.
The Appender is somewhat faster at runtime than the naive version
but not a whole lot.
And it takes twice the compile time.
So for code that is supposed to be at runtime a quick `~=` in the
code is not bad at all.
If it's very heavily used the Appender can make a difference, at
runtime as well.
But it should absolutely be avoided if you are going to use the
same code at CTFE.
The real lesson though is to be very skeptical of extermely good
results and verify behavior as well as performance! :)
Cheers,
Stefan
More information about the Digitalmars-d
mailing list