newCTFE gets a 10x faster string concat

Thu Sep 23 16:53:36 UTC 2021

On Thursday, 23 September 2021 at 13:01:33 UTC, Stefan Koch wrote:
> Hi there,
> [ ... 10x difference  bla bla ...]

Of course it is possible by varying the test-cases to get an 
almost arbitrary speedup.

```
Benchmark #1: generated/linux/release/64/dmd -c 
testStringConcat.d -new-ctfe
   Time (mean ± σ):     160.3 ms ±   2.8 ms    [User: 121.6 ms, 
System: 38.4 ms]
   Range (min … max):   154.1 ms … 164.9 ms    18 runs

Benchmark #2: generated/linux/release/64/dmd -c testStringConcat.d
   Time (mean ± σ):      6.538 s ±  0.105 s    [User: 3.253 s, 
System: 3.276 s]
   Range (min … max):    6.450 s …  6.768 s    10 runs

Summary
   'generated/linux/release/64/dmd -c testStringConcat.d 
-new-ctfe' ran
    40.79 ± 0.96 times faster than 'generated/linux/release/64/dmd 
-c testStringConcat.d'
```

The highest I have been able to get it a 50x ... after that the 
old interpreter will run out of memory and freeze my computer
The code for the benchmark below is:
```d
string makeBigString(int N)
{
     string x = "this is the string I want to append\n";
     string result = "";
     foreach(_; 0 .. N)
     {
         result ~= x;
     }
     return result;
}

// pragma(msg, makeBigString(cast(uint)(short.max * 
1.91)).length);
// max for newCTFE we run out of 32 address space after this
// commented out because without newCTFE we just crash

int[] crappyIota(int N)
{
     int[] result = [];
     foreach(i; 0 .. N)
     {
         result ~= i;
     }
     return result;
}

pragma(msg, crappyIota(short.max).length + 
crappyIota(short.max)[$-1]);
pragma(msg, makeBigString(cast(uint)(short.max / 4)).length);
pragma(msg, makeBigString(cast(uint)(short.max / 2)).length);
```

As you can see `makeBigString(cast(uint)(short.max * 
1.91)).length)`
is the most I can test at all since the newCTFE VM uses a 31bit 
bit heap address space.
as half of the space is reserved for the stack.
I am meaning to change the 2GB/2GB split to a 3.498 GB / 0.512 GB 
split
but I haven't done that yet.

For the example above newCTFE uses 60 times less memory than the 
current interpreter.