Found one (the?) ARM bug: (Pointer) aliasing issues

Johannes Pfau nospam at example.com
Fri Nov 8 11:16:03 PST 2013


Am Fri, 8 Nov 2013 13:12:23 +0000
schrieb Iain Buclaw <ibuclaw at ubuntu.com>:

> On 7 November 2013 18:05, Johannes Pfau <nospam at example.com> wrote:
> 
> > Am Thu, 7 Nov 2013 16:14:59 +0000
> > schrieb Iain Buclaw <ibuclaw at ubuntu.com>:
> > type here. It would allow code like this to work: "char[] a; char*
> > b = (cast(char*)&a)" but I don't see why this should work, it's
> > illegal anyway?
> >
> >
> That would be seen as two distinct alias sets that would break strict
> aliasing in that example.
> 
> Though, will have to implement -Wstrict-aliasing in the front-end to
> get any feel for what could potentially be utterly wrong.  But the
> idea is that for dynamic arrays, telling gcc to not rely on
> structural equality to determine whether or not two dynamic arrays
> are part of the same alias set.
> 
> eg:
> byte[] a, long[] b = *(cast (long[]*)&a) should be seen as being
> different alias sets, and so *will* be about breaking strict aliasing.
> 
> In contrast, string[] a, char[] b = *cast(string[]*)&a) should be
> seen as being part of the same alias set, and so the compiler must
> know that the two types (which are distinct structures to the
> backend) could potentially be referencing the same slice of memory,
> as to not cause any problems.

Now I get what you mean. AFAIK the backend already strips type
qualifiers when building alias sets, so char[] and string[] should
already have the same alias set. But if you want to explicitly
enforce this in GDC this makes sense.

> 
> For people trying to work around the cast system for dynamic arrays,
> IMO they should be punished for it, and told to do it in the correct
> way that invokes _d_arraycopy, or do their unsafe work through unions.
> 

OK. Although then we have to be very careful when implementing
_d_arraycast (but maybe we can just check if we're in _d_arraycast and
disable aliasing? Is this possible?).

> >
> >
> > // Permit type-punning when accessing a union
> >
> > Isn't that already guaranteed by GCC? See:
> > http://code.metager.de/source/xref/gnu/gcc/gcc/alias.c#982
> > [...]
> >
> >
> There is no harm enforcing it in the front-end as well, [...]
> 
OK


> 
> Not that I'm aware of (other than Ada boasting it's use).  But I'd
> like to push the opinion of - although it isn't in the spec, D should
> be strict aliasing.  And people should be aware of the problems
> breaking strict aliasing (see:
> http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html)

I only found this: 
http://docs.lib.purdue.edu/ecetr/123/

They list performance improvements in the range of 0-6% depending on
architecture. It's testing GCC from 2004 so some things may have
changed, but 0-6% is not bad.

> But first... plan of attack:
> 
> - We first disable strict aliasing entirely (lang_hook.get_alias_set
> => 0)
> - Implement -Wstrict-aliasing
> - Start turning on strict aliasing for types guaranteed not to be
> referencing the same memory location as other types (eg: TypeBasic,
> TypeDelegate, TypeSArray, TypeVector).
> - Identify gdc problems with implicit code generation that could break
> strict aliasing (these are our bugs).
> - Identify frontend/library problems that could break strict aliasing
> (these are the dmd/phobos developer's bugs).
> - Turn on strict aliasing for the remaining types.  For those that
> cause problems, we can define a TYPE_LANG_FLAG macro to allow us to
> tell the backend if the type can alias any other types.

Sounds good. If we can convince Walter to do this step-by-step
(especially -Wstrict-aliasing needs to be in all compilers) this should
work. But if GDC started enforcing strict aliasing and DMD didn't people
probably wouldn't fix their code.

Maybe the ARM port will be finished when we start working on this list
so we have one more architecture to verify. ~5 phobos unit tests
remaining ;-) (test suite already passes)

> 
> Other possible considerations:
> 
> - Most D code pretty much assumes that any object may be accessed via
> a void[] or void*.

Probably ubyte*, ubyte[] as well.

> 
> - Infact, for the consideration of std.math.  It we could go one step
> further and simply build up an alias set list based on the type size
> over type distinction.  In this model
> double/long/byte[8]/short[4]/int[2] would all be considered as types
> that could be referencing each other.
> 

I don't think that really helps much. Think of real with 16bytes. This
may also negatively impact possible optimizations (I'd guess many types
are word-sized. Another place where statistics would be nice)


More information about the D.gnu mailing list