What LDC flags should be used to get the fastest executable on Windows?

Preetpal preetpal.sohal at gmail.com
Sat Mar 6 13:45:42 UTC 2021


On Saturday, 6 March 2021 at 11:50:05 UTC, Dennis wrote:
> On Saturday, 6 March 2021 at 10:57:31 UTC, Preetpal wrote:
>> On Saturday, 6 March 2021 at 10:51:35 UTC, Preetpal wrote:
>>> On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
>>>> There's not much going on in the code there. Where are you 
>>>> experiencing problems? Have you profiled it?
>>>
>>> As it is small program, I re-implemented it in C 
>>> (https://gist.github.com/preetpalS/81405cd78ade738034cfa6d49e2a4202) to see if it could reduce the problem I was seeing. Based on my observations it did reduce the problem but it did not eliminate it. This led me to believe that the issue I was seeing in the D version was performance-related.
>>
>> I just really want to be sure that the D version of the 
>> program can match the C version of the program.
>
> I very much doubt this is performance related.
> Your program doesn't do any heavy computations itself, it just 
> calls into the Windows API, which is dynamically linked, so 
> link-time optimization makes no difference. Optimization flags 
> like -O3 just shave off nanoseconds from the calling code.
>
> A notable difference with your C version is that you use global 
> variables, which in D are thread-local by default. To match C, 
> make your declarations __gshared and see if it helps:
>
> ```
> __gshared bool altPressed = false;
> __gshared bool controlPressed = false;
> __gshared bool shiftPressed = false;
> __gshared bool winkeyPressed = false;
> ```

I am kind of skeptical that this problem is performance-related 
as well but based on the decreased number of times that this 
problem occurred during my usage of the C version of program 
versus the D version that was not compiled with as aggressive 
optimization flags, it suggests that this is a performance 
problem. Additionally after compiling the D version of the 
program with more aggressive optimization flags, I did notice the 
problem occur less frequently. It could be a coincidence but 
either way I would like to optimize this program.

Well I created the following test program on https://godbolt.org 
to see what kind of difference __gshared makes:

__gshared bool test = false;

extern (C) int main(string[] args) {
     testString(args[0]);
     return 0;
}

void testString(string input) {
     if (input.length > 5) {
         test = true;
     }
}

Using __gshared reduces the number of instructions the compiler 
generates, so it should make it faster. The godbolt website 
currently does not support compiling D on Windows, if it did I 
could just compare the generated code between the two versions of 
the program right now (the C program can be compiled there). 
Thanks for the suggestion about using __gshared. Also, using -O3 
also seems to reduce the number of instructions generated in the 
small test program.

Without __gshared: https://godbolt.org/z/dKTGc6

With __gshared: https://godbolt.org/z/8hxP1q


More information about the digitalmars-d-ldc mailing list