Found one (the?) ARM bug: (Pointer) aliasing issues

Sat Nov 2 13:07:30 PDT 2013

I think I finally found the root cause for one of the remaining ARM
bugs (the codegen bug which only appears on -O2 or higher).

What happened in the test case was that the gcc backend illegally moved
a read to a memory location before the write. It turns out that GCC
assumed that the read and write locations were in different alias sets
and therefore could not possibly reference the same memory location. One
alias set was for 'ubyte[]' and one for 'char[]'.

The code that triggers this issue is in std.algorithm.find:
--------
R1 find(...)(R1 haystack, R2 needle) if (/* are strings*/)
{
    return cast(char[]) .find!(ubyte[], ubyte[])
        (cast(ubyte[]) haystack, cast(ubyte[])needle);
}
--------
(Real code:
https://github.com/D-Programming-GDC/GDC/blob/master/libphobos/src/std/algorithm.d#L3555
)

The generic emitted by gdc:
--------
return <retval> = *(struct  *) &find (*(struct  *) &haystack, *(struct
*) &needle);
--------

Or in the raw form:
@44 = ubyte[]
@11 = string
--------
@11     record_type      name: @17      size: @18      algn: 32      
                         tag : struct   flds: @19   
@31     pointer_type     size: @15      algn: 32       ptd : @11    
@40     pointer_type     size: @15      algn: 32       ptd : @44   
@41     call_expr        type: @44      fn  : @45      0   : @46     
                         1   : @47
@44     record_type      name: @49      size: @18      algn:  
                         tag : struct   flds: @50    
@46     indirect_ref     type: @44      op 0: @53     
@47     indirect_ref     type: @44      op 0: @54    
@53     nop_expr         type: @40      op 0: @58     
@54     nop_expr         type: @40      op 0: @59    
@58     addr_expr        type: @31      op 0: @30     
@59     addr_expr        type: @31      op 0: @61 
--------

Here it's easy to see that we essentially generate this code:
*(cast(ubyte[]*)(&haystack))
and this is AFAIK a violation of the aliasing rules.

This problem is not observed at -O1 as the function call generates a
new stackframe and we therefore have two distinct memory locations. But
with -O2 inlining removes this copy and we now have variables
referencing the same memory location with type ubyte[] and char[].

I can not provide a reduced testcase as the smallest changes in
compiler codegen (gcc version, gdc commit) or the test case can hide
this issue. It's always reproducible with test15.d in the test suite.
(In older gdc versions the test case does not segfault but it produces
wrong output at -O2. This can be a very subtle difference)

But here's a working code snippet which illustrates the generic
generated by GDC:

-----------------
void main()
{
	char[] in1 = "Test".dup;
        char[] in2 = "Test2".dup;

        char[] result = cast(char[])find(cast(ubyte[])in1,
            cast(ubyte[])in2);
}

ubyte[] find(ubyte[] a, ubyte[] b)
{
    return a;
}
-----------------

So the important question here: Is this a bug in GDC codegen or is the
code in std.algorithm invalid? According to
http://dlang.org/expression.html
"The cast is done as a type paint" so this could indeed be interpreted
as a user mistake. But OTOH that page also talks about a runtime check
of the array .lengths which is clearly missing here.

I'm also wondering if that runtime check can actually fix this
aliasing issue or if it can come up again if the runtime check itself
is inlined?