How to prevent optimizer from reordering stuff?

Sun Mar 15 23:12:24 PDT 2015

Dan Olson <zans.is.for.cans at yahoo.com> writes:

> While tracking down std.math problems for ARM, I find that optimizer
> will reorder instructions to get FPSCR flags before the divide
> operation.
>
> Is there is a way to force instruction ordering here?  I tried the
> llvm_memory_fence, but it doesn't do the job.
>
> real zero = 0.0;
>
> void foo()
> {
>     import std.math, std.c.stdio, ldc.llvmasm;
>
>     real x = 1.0 / zero;
>
>     auto f = __asm!uint("vmrs $0, fpscr", "=r");
>     IeeeFlags flags = ieeeFlags();
>     printf("%f, %u %d\n", x, f, flags.divByZero);
> }
>
> Compiled with -O -mtriple=thumbv7-apple-ios, you can see that vdiv is
> after both my inline asm and std.math ieeeFlags().
>
> 	vldr	d8, [r0]
> 	@ InlineAsm Start
> 	vmrs	r4, fpscr
> 	@ InlineAsm End
> 	mov	r0, r5
> 	blx	__D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags
> 	vmov.f64	d16, #1.000000e+00
> 	mov	r0, r5
> 	vdiv.f64	d8, d16, d8
>
> What to do?

I have a solution.  At least it is a start.  Specifying the result of
the floating point operation as argument of an empty inline asm gives
correct ordering.  And doesn't do any unnecessary stores like the C
volatile trick (FORCE_EVAL macro).

For my use, I wrapped the inline asm in a function "use()" that is
specific to ARM because of the 'w' constraint.  I am thinking it could
be named FORCE_EVAL to align with what is in linux libm and then made
general for other D cpu targets.

void use(T)(T x) @nogc nothrow
{
    import std.traits;
    static if (isFloatingPoint!(T))
        __asm("", "w", x);   // arm fp reg
    else
        __asm("", "r", x);
}

Compile as before (-O), but with use(x).

real zero = 0.0;

void foo()
{
    import std.math, std.c.stdio, ldc.llvmasm;

    real x = 1.0 / zero;
    use(x);

    // get float flags in arm specifc way
    auto f = __asm!uint("vmrs $0, fpscr", "=r");
    // get float flags D way
    IeeeFlags flags = ieeeFlags();
    printf("%f, %u %d\n", x, f, flags.divByZero);
}

Now vdiv.f64 happens before all the flag fetching.

	vmov.f64	d16, #1.000000e+00
	add	r5, sp, #4
	vldr	d17, [r0]
	mov	r0, r5
	vdiv.f64	d8, d16, d17          <------ yeah!
	@ InlineAsm Start
	@ InlineAsm End
	@ InlineAsm Start
	vmrs	r4, fpscr
	@ InlineAsm End
	blx	__D3std4math9ieeeFlagsFNdZS3std4math9IeeeFlags

--
Dan