Found one (the?) ARM bug: (Pointer) aliasing issues
Johannes Pfau
nospam at example.com
Sat Nov 2 13:07:30 PDT 2013
I think I finally found the root cause for one of the remaining ARM
bugs (the codegen bug which only appears on -O2 or higher).
What happened in the test case was that the gcc backend illegally moved
a read to a memory location before the write. It turns out that GCC
assumed that the read and write locations were in different alias sets
and therefore could not possibly reference the same memory location. One
alias set was for 'ubyte[]' and one for 'char[]'.
The code that triggers this issue is in std.algorithm.find:
--------
R1 find(...)(R1 haystack, R2 needle) if (/* are strings*/)
{
return cast(char[]) .find!(ubyte[], ubyte[])
(cast(ubyte[]) haystack, cast(ubyte[])needle);
}
--------
(Real code:
https://github.com/D-Programming-GDC/GDC/blob/master/libphobos/src/std/algorithm.d#L3555
)
The generic emitted by gdc:
--------
return <retval> = *(struct *) &find (*(struct *) &haystack, *(struct
*) &needle);
--------
Or in the raw form:
@44 = ubyte[]
@11 = string
--------
@11 record_type name: @17 size: @18 algn: 32
tag : struct flds: @19
@31 pointer_type size: @15 algn: 32 ptd : @11
@40 pointer_type size: @15 algn: 32 ptd : @44
@41 call_expr type: @44 fn : @45 0 : @46
1 : @47
@44 record_type name: @49 size: @18 algn:
tag : struct flds: @50
@46 indirect_ref type: @44 op 0: @53
@47 indirect_ref type: @44 op 0: @54
@53 nop_expr type: @40 op 0: @58
@54 nop_expr type: @40 op 0: @59
@58 addr_expr type: @31 op 0: @30
@59 addr_expr type: @31 op 0: @61
--------
Here it's easy to see that we essentially generate this code:
*(cast(ubyte[]*)(&haystack))
and this is AFAIK a violation of the aliasing rules.
This problem is not observed at -O1 as the function call generates a
new stackframe and we therefore have two distinct memory locations. But
with -O2 inlining removes this copy and we now have variables
referencing the same memory location with type ubyte[] and char[].
I can not provide a reduced testcase as the smallest changes in
compiler codegen (gcc version, gdc commit) or the test case can hide
this issue. It's always reproducible with test15.d in the test suite.
(In older gdc versions the test case does not segfault but it produces
wrong output at -O2. This can be a very subtle difference)
But here's a working code snippet which illustrates the generic
generated by GDC:
-----------------
void main()
{
char[] in1 = "Test".dup;
char[] in2 = "Test2".dup;
char[] result = cast(char[])find(cast(ubyte[])in1,
cast(ubyte[])in2);
}
ubyte[] find(ubyte[] a, ubyte[] b)
{
return a;
}
-----------------
So the important question here: Is this a bug in GDC codegen or is the
code in std.algorithm invalid? According to
http://dlang.org/expression.html
"The cast is done as a type paint" so this could indeed be interpreted
as a user mistake. But OTOH that page also talks about a runtime check
of the array .lengths which is clearly missing here.
I'm also wondering if that runtime check can actually fix this
aliasing issue or if it can come up again if the runtime check itself
is inlined?
More information about the D.gnu
mailing list