Inline assembly and Profiling

Marco Leise via Digitalmars-d-learn digitalmars-d-learn at puremagic.com
Sat Mar 5 15:51:01 PST 2016


Am Tue, 01 Mar 2016 02:30:04 +0000
schrieb Matthew Dudley <pontifechs at gmail.com>:

> I'm working on a chess engine side-project, and I'm starting to 
> get into profiling and optimization.
> 
> One of the optimizations I've made involves some inline assembly, 
> and I ran across some apparently bizarre behavior today, and I 
> just wanted to double-check that I'm not doing something wrong.
> 
> Here's the behavior boiled down:
> 
> import std.stdio;
> ubyte LS1B(ulong board)
> {
>    asm
>    {
>      bsf RAX, board;
>    }
> }
> 
> void main()
> {
>    auto one = 0x939839FA;
>    assert(one.LS1B == 1, "Wrong LS1B!");
> }
> 
> If I run this through DMD without profiling on, it runs 
> successfully, but with profiling on, the assertion fails. And in 
> the actual code, it returns seeming random numbers.
> 
> Is the profiling code stomping on my toes here? Am I not allowed 
> to just single instruction into RAX like this with profiling on? 
> Or is this just a compiler bug?

I didn't check the documentation, but I believe you have to
store RAX into some variable and return that when you use
inline assembly. In any case you should report a bug about
this. If this code is correct, then DMD assumes you implicitly
set the return value inside the asm-block and profiling should
save RAX. If this is not intended, then the function is
missing a return statement.

Alternatively you can turn this into a naked function by
starting your asm-block with "naked" and adding an explicit
"ret" at the end. Naked asm means that the functions only
contains the instructions you have explicitly written down,
circumventing the profiling instrumentation.

Either way functions with DMD-style inline assembly cannot be
inlined at all, which means you are better off looking into
the core.bitops compiler intrinsics.

Also code coverage or profiling (forgot which one) used
to not work in multi-threaded code!

What I typically do is compile on Linux with GDC or LDC and
use an external sampling profiler such as OProfile. You will
need change some optimizations in the compiler (no inlining,
debug information, keep frame pointers) so function call stack
can actually be reasoned about. After a profile run you can
then display the result in various ways. At first these are
confusing, but you'll get the hang of it after a while.
For example you could display sample counts per line of code,
or display a call graph which tells you the time spent in a
function separated by call site.
OProfile being a system profiler is not limited to your
program. It can include time spent in kernel functions or just
profile the whole system at once.

-- 
Marco



More information about the Digitalmars-d-learn mailing list