Found one (the?) ARM bug: (Pointer) aliasing issues

Iain Buclaw ibuclaw at ubuntu.com
Fri Nov 8 05:12:23 PST 2013


On 7 November 2013 18:05, Johannes Pfau <nospam at example.com> wrote:

> Am Thu, 7 Nov 2013 16:14:59 +0000
> schrieb Iain Buclaw <ibuclaw at ubuntu.com>:
>
> > On 3 November 2013 10:20, Johannes Pfau <nospam at example.com> wrote:
> >
> > > Am Sun, 3 Nov 2013 02:10:20 +0000
> > > schrieb Iain Buclaw <ibuclaw at ubuntu.com>:
> > >
> > > > last time I
> > > > checked, returning 0 disables aliasing rules from taking effect.
> > >
> > > That should work. Alias set 0 is a special alias set which conflicts
> > > with everything. I'll check if it works as expected.
> > >
> >
> >
> > This is taken from hunks in the 2.064 merge I'm testing:
> >
> > Pastebin link here:  http://pastebin.com/jxQQL68N
> >
>
> Some probably stupid questions about this patch:
>
> // If the type is a dynamic array, use the alias set of the basetype.
>
> What exactly does happen in that case? The Tarray type is the
> two-field type consisting of length and ptr, right? Currently
> TypeDArray->toCtype constructs a two_field_type with size_t and
> typeof(Element)*. So according to the C aliasing rules, the TypeDArray
> alias set does already conflict with size_t and Element*. It does not
> conflict with Element. But I don't know why it should conflict with
> Element if we're talking about the slice type here. It would
> allow code like this to work: "char[] a; char* b = (cast(char*)&a)" but
> I don't see why this should work, it's illegal anyway?
>
>
That would be seen as two distinct alias sets that would break strict
aliasing in that example.

Though, will have to implement -Wstrict-aliasing in the front-end to get
any feel for what could potentially be utterly wrong.  But the idea is that
for dynamic arrays, telling gcc to not rely on structural equality to
determine whether or not two dynamic arrays are part of the same alias set.

eg:
byte[] a, long[] b = *(cast (long[]*)&a) should be seen as being different
alias sets, and so *will* be about breaking strict aliasing.

In contrast, string[] a, char[] b = *cast(string[]*)&a) should be seen as
being part of the same alias set, and so the compiler must know that the
two types (which are distinct structures to the backend) could potentially
be referencing the same slice of memory, as to not cause any problems.

For people trying to work around the cast system for dynamic arrays, IMO
they should be punished for it, and told to do it in the correct way that
invokes _d_arraycopy, or do their unsafe work through unions.



> Also, AFAICS it does not help with the problem in std.algorithm:
> char[] a;
> //cast(ubyte[])a generates:
> *cast(ubyte[]*)&a;
>
> Do you think this cast should be illegal in D?
> I think if we want to support strict aliasing for the code above we'll
> have to do what gcc does for pointers:
> http://code.metager.de/source/xref/gnu/gcc/gcc/alias.c#819
> Put all array slices - regardless of element type - into the same alias
> set and make size_t and void* subsets of this alias set.
>
>
> // Permit type-punning when accessing a union
>
> Isn't that already guaranteed by GCC? See:
> http://code.metager.de/source/xref/gnu/gcc/gcc/alias.c#982
> Unions have all their member types added as subsets. So as long as the
> reference is through the union GCC knows the union type and it'll
> conflict with all member types.
>
>
There is no harm enforcing it in the front-end as well, even if it is just
there to speed up the process of returning what the backend will no doubt
return too.  There's also the (extremely) unlikely event that the guarantee
by GCC might be removed in a later version.


>
> But even if we make those changes to aliasing rules, we'll have to fix
> many places in phobos. For example:
>
> https://github.com/D-Programming-Language/phobos/blob/master/std/math.d#L1965
> real value;
> ushort* vu = cast(ushort*)&value;
> AFAICS this will always be invalid with strict aliasing.
>
>
Yep, as it should be.  std.math is a danger point for type punning between
pointers and reals, ensuring that type-punning/casting does not get DCE'd,
etc...  This needs to be fixed.


https://github.com/D-Programming-Language/phobos/blob/master/std/uuid.d#L468
> casts ubyte[16]* to size_t* also illegal, AFAICS.
>
> Are there any statistics about the performance improvements with strict
> aliasing? I'm not really sold on the idea of strict aliasing, right now
> it looks to me as if it's mainly a way to introduce subtle, hard to
> debug and often latent bugs (As whether you really see a problem
> depends on optimization)
>
>
> http://stackoverflow.com/questions/1225741/performance-impact-of-fno-strict-aliasing
>

Not that I'm aware of (other than Ada boasting it's use).  But I'd like to
push the opinion of - although it isn't in the spec, D should be strict
aliasing.  And people should be aware of the problems breaking strict
aliasing (see:
http://cellperformance.beyond3d.com/articles/2006/06/understanding-strict-aliasing.html)

But first... plan of attack:

- We first disable strict aliasing entirely (lang_hook.get_alias_set => 0)
- Implement -Wstrict-aliasing
- Start turning on strict aliasing for types guaranteed not to be
referencing the same memory location as other types (eg: TypeBasic,
TypeDelegate, TypeSArray, TypeVector).
- Identify gdc problems with implicit code generation that could break
strict aliasing (these are our bugs).
- Identify frontend/library problems that could break strict aliasing
(these are the dmd/phobos developer's bugs).
- Turn on strict aliasing for the remaining types.  For those that cause
problems, we can define a TYPE_LANG_FLAG macro to allow us to tell the
backend if the type can alias any other types.


I still stand by what I say on aliasing rules of D:
- Permit type-punning when accessing through a union
- Dynamic arrays of the same basetype (regardless of qualifiers) may alias
each other/occupy the same slice of memory.

Other possible considerations:

- Most D code pretty much assumes that any object may be accessed via a
void[] or void*.

- C standard allows aliasing between signed and unsigned variants.  It is
therefore likely not unreasonable to do the same for convenience.

- Infact, for the consideration of std.math.  It we could go one step
further and simply build up an alias set list based on the type size over
type distinction.  In this model double/long/byte[8]/short[4]/int[2] would
all be considered as types that could be referencing each other.

-- 
Iain Buclaw

*(p < e ? p++ : p) = (c & 0x0f) + '0';
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.puremagic.com/pipermail/d.gnu/attachments/20131108/a4e4b52f/attachment.html>


More information about the D.gnu mailing list