What LDC flags should be used to get the fastest executable on Windows?
Preetpal
preetpal.sohal at gmail.com
Sat Mar 6 13:45:42 UTC 2021
On Saturday, 6 March 2021 at 11:50:05 UTC, Dennis wrote:
> On Saturday, 6 March 2021 at 10:57:31 UTC, Preetpal wrote:
>> On Saturday, 6 March 2021 at 10:51:35 UTC, Preetpal wrote:
>>> On Saturday, 6 March 2021 at 09:07:05 UTC, Imperatorn wrote:
>>>> There's not much going on in the code there. Where are you
>>>> experiencing problems? Have you profiled it?
>>>
>>> As it is small program, I re-implemented it in C
>>> (https://gist.github.com/preetpalS/81405cd78ade738034cfa6d49e2a4202) to see if it could reduce the problem I was seeing. Based on my observations it did reduce the problem but it did not eliminate it. This led me to believe that the issue I was seeing in the D version was performance-related.
>>
>> I just really want to be sure that the D version of the
>> program can match the C version of the program.
>
> I very much doubt this is performance related.
> Your program doesn't do any heavy computations itself, it just
> calls into the Windows API, which is dynamically linked, so
> link-time optimization makes no difference. Optimization flags
> like -O3 just shave off nanoseconds from the calling code.
>
> A notable difference with your C version is that you use global
> variables, which in D are thread-local by default. To match C,
> make your declarations __gshared and see if it helps:
>
> ```
> __gshared bool altPressed = false;
> __gshared bool controlPressed = false;
> __gshared bool shiftPressed = false;
> __gshared bool winkeyPressed = false;
> ```
I am kind of skeptical that this problem is performance-related
as well but based on the decreased number of times that this
problem occurred during my usage of the C version of program
versus the D version that was not compiled with as aggressive
optimization flags, it suggests that this is a performance
problem. Additionally after compiling the D version of the
program with more aggressive optimization flags, I did notice the
problem occur less frequently. It could be a coincidence but
either way I would like to optimize this program.
Well I created the following test program on https://godbolt.org
to see what kind of difference __gshared makes:
__gshared bool test = false;
extern (C) int main(string[] args) {
testString(args[0]);
return 0;
}
void testString(string input) {
if (input.length > 5) {
test = true;
}
}
Using __gshared reduces the number of instructions the compiler
generates, so it should make it faster. The godbolt website
currently does not support compiling D on Windows, if it did I
could just compare the generated code between the two versions of
the program right now (the C program can be compiled there).
Thanks for the suggestion about using __gshared. Also, using -O3
also seems to reduce the number of instructions generated in the
small test program.
Without __gshared: https://godbolt.org/z/dKTGc6
With __gshared: https://godbolt.org/z/8hxP1q
More information about the digitalmars-d-ldc
mailing list