How to prevent optimizer from reordering stuff?
Dan Olson via digitalmars-d-ldc
digitalmars-d-ldc at puremagic.com
Sun Mar 15 23:12:24 PDT 2015
Dan Olson <zans.is.for.cans at yahoo.com> writes:
> While tracking down std.math problems for ARM, I find that optimizer
> will reorder instructions to get FPSCR flags before the divide
> operation.
>
> Is there is a way to force instruction ordering here? I tried the
> llvm_memory_fence, but it doesn't do the job.
>
> real zero = 0.0;
>
> void foo()
> {
> import std.math, std.c.stdio, ldc.llvmasm;
>
> real x = 1.0 / zero;
>
> auto f = __asm!uint("vmrs $0, fpscr", "=r");
> IeeeFlags flags = ieeeFlags();
> printf("%f, %u %d\n", x, f, flags.divByZero);
> }
>
> Compiled with -O -mtriple=thumbv7-apple-ios, you can see that vdiv is
> after both my inline asm and std.math ieeeFlags().
>
> vldr d8, [r0]
> @ InlineAsm Start
> vmrs r4, fpscr
> @ InlineAsm End
> mov r0, r5
> blx __D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags
> vmov.f64 d16, #1.000000e+00
> mov r0, r5
> vdiv.f64 d8, d16, d8
>
> What to do?
I have a solution. At least it is a start. Specifying the result of
the floating point operation as argument of an empty inline asm gives
correct ordering. And doesn't do any unnecessary stores like the C
volatile trick (FORCE_EVAL macro).
For my use, I wrapped the inline asm in a function "use()" that is
specific to ARM because of the 'w' constraint. I am thinking it could
be named FORCE_EVAL to align with what is in linux libm and then made
general for other D cpu targets.
void use(T)(T x) @nogc nothrow
{
import std.traits;
static if (isFloatingPoint!(T))
__asm("", "w", x); // arm fp reg
else
__asm("", "r", x);
}
Compile as before (-O), but with use(x).
real zero = 0.0;
void foo()
{
import std.math, std.c.stdio, ldc.llvmasm;
real x = 1.0 / zero;
use(x);
// get float flags in arm specifc way
auto f = __asm!uint("vmrs $0, fpscr", "=r");
// get float flags D way
IeeeFlags flags = ieeeFlags();
printf("%f, %u %d\n", x, f, flags.divByZero);
}
Now vdiv.f64 happens before all the flag fetching.
vmov.f64 d16, #1.000000e+00
add r5, sp, #4
vldr d17, [r0]
mov r0, r5
vdiv.f64 d8, d16, d17 <------ yeah!
@ InlineAsm Start
@ InlineAsm End
@ InlineAsm Start
vmrs r4, fpscr
@ InlineAsm End
blx __D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags
--
Dan
More information about the digitalmars-d-ldc
mailing list