Most performant way of converting int to string

Tue Dec 22 10:27:12 PST 2015

On Tuesday, 22 December 2015 at 17:52:52 UTC, Andrew Chapman 
wrote:
> On Tuesday, 22 December 2015 at 17:43:00 UTC, H. S. Teoh wrote:
>
>> I wonder if the slowdown is caused by GC collection cycles 
>> (because calling to!string will allocate, and here you're 
>> making a very large number of small allocations, which is 
>> known to cause GC performance issues).
>>
>> Try inserting this before the loop:
>>
>> 	import core.memory;
>> 	GC.disable();
>>
>> Does this make a difference in the running time?
>>
>>
>> T
>
> Thanks!  Unfortunately that actually makes it run slightly 
> slower :-)

I dislike guessing so here are my thought on it and the method I 
used to get there:

First I used the following code:

     void f() {
         import std.conv;

         string s;
         foreach (i ; 0..1_000_000) {
             s = i.to!string;
         }
     }

     void g() {
         import core.memory;
         import std.conv;

         GC.disable;

         string s;
         foreach (i ; 0..1_000_000) {
             s = i.to!string;
         }
     }

     extern (C) int snprintf(char*, size_t, const char*, ...);
     void h() {
         char[10] s;
         foreach (i ; 0..1_000_000) {
             snprintf(cast(char*)s, 10, cast(char*)"%d", i);
         }
     }

     void main(string[] args) {
            f;
         // g;
         // h;
     }

Note that for h I didn't really use a string, I used a char[10]. 
We should
keep in mind that you may want to convert it to a proper string 
later.

I compiled it three times (one per function) with `dmd -profile 
-profile=gc`
and then ran it to get profiling data. I then compiled it with 
`dmd` while
timing each execution 6 times and discarding the first time (as 
the first one
has to heat the cache). The reason for the recompilation is that 
as the C
code can't be profiled and profiling puts some overhead I wanted 
to get that
off my shoulders while timing.

Here are the average times:

f: 0.50 user 0.00 system 95% CPU 0.530 total

g: 0.49 user 0.01 system 96% CPU 0.522 total

h: 0.17 user 0.00 system 92% CPU 0.188 total

The GC profiler didn't find any allocation be it in f, g or h.

The profiler was only available for f and g with very similar 
results (as
they are the same function really, here is the interesting part):

======== Timer Is 3579545 Ticks/Sec, Times are in Microsecs 
========

   Num          Tree        Func        Per
   Calls        Time        Time        Call

1000000   390439368   265416744         265     pure nothrow 
@safe char[] std.array.array!(std.conv.toChars!(10, char, 1, 
int).toChars(int).Result).array(std.conv.toChars!(10, char, 1, 
int).toChars(int).Result)
1000000    83224994    83224994          83     pure nothrow ref 
@nogc @safe std.conv.toChars!(10, char, 1, 
int).toChars(int).Result std.conv.toChars!(10, char, 1, 
int).toChars(int).Result.__ctor(int)
23666670 1182768507    73190160           3     _Dmain
1000000   525732328    35191373          35     pure nothrow 
@trusted immutable(char)[] std.conv.toImpl!(immutable(char)[], 
int).toImpl(int, uint, std.ascii.LetterCase)
5888890    43695659    33204064           5     pure nothrow ref 
@nogc @safe char std.conv.emplaceRef!(char, char).emplaceRef(ref 
char, ref char)
1000000    32745909    32745909          32     pure nothrow 
char[] std.array.arrayAllocImpl!(false, char[], 
ulong).arrayAllocImpl(ulong)
1000000   100101586    16876591          16     pure nothrow 
@nogc @safe std.conv.toChars!(10, char, 1, 
int).toChars(int).Result std.conv.toChars!(10, char, 1, 
int).toChars(int)
5888890    13070014    13070014           2     pure nothrow 
@property @nogc @safe char std.conv.toChars!(10, char, 1, 
int).toChars(int).Result.front()
1000000    44899444    12153535          12     pure nothrow 
@trusted char[] std.array.uninitializedArray!(char[], 
ulong).uninitializedArray(ulong)
5888890    10491595    10491595           1     pure nothrow ref 
@nogc @safe char 
std.conv.emplaceImpl!(char).emplaceImpl!(char).emplaceImpl(ref 
char, ref char)
6888890     9952971     9952971           1     pure nothrow 
@property @nogc @safe bool std.conv.toChars!(10, char, 1, 
int).toChars(int).Result.empty()
1000000    53086148     8186704           8     pure nothrow 
@trusted char[] std.array.array!(std.conv.toChars!(10, char, 1, 
int).toChars(int).Result).array(std.conv.toChars!(10, char, 1, 
int).toChars(int).Result).__lambda2()
1000000   530599849     4867521           4     pure nothrow 
@safe immutable(char)[] std.conv.toImpl!(immutable(char)[], 
int).toImpl(int)
1000000   534801618     4201769           4     pure nothrow 
@safe immutable(char)[] 
std.conv.to!(immutable(char)[]).to!(int).to(int)
5888890     2520677     2520677           0     pure nothrow 
@nogc @safe void std.conv.toChars!(10, char, 1, 
int).toChars(int).Result.popFront()
1000000     2138919     2138919           2     pure nothrow 
@nogc @trusted char[] std.array.array!(std.conv.toChars!(10, 
char, 1, int).toChars(int).Result).array(std.conv.toChars!(10, 
char, 1, int).toChars(int).Result).__lambda3()
1000000      558232      558232           0     pure nothrow 
@property @nogc @safe ulong std.conv.toChars!(10, char, 1, 
int).toChars(int).Result.length()

You may not be used to read such a bare listing so here are some 
pointers:

- Num Calls is the number of calls made to that function
- Tree Time is the total cumulative time spent in the function 
and its subfunctions
- Func Time is the total time spent in the function only
- Per Call is the average time spent in the function only

The functions are sorted by their Func Time. As we can see what 
took the most
time was converting the range of chars to an array in order to 
store it in a
string. This is a cost that we obviously don't have in h.

It seems that using snprintf would be the fastest here although 
one could
replicate its behaviour in D. I don't think there is anything in 
the standard
library that would really help here as (if I read it correctly) 
it is mainly
because of the conversion from ranges to arrays that this code is 
slow.