Possible bug in associative array implementation (and/or @safe checking)

Fri Aug 17 13:14:58 UTC 2018

On 8/16/18 4:45 PM, Aaron D. Trout wrote:
> On Thursday, 16 August 2018 at 18:56:45 UTC, Steven Schveighoffer wrote:
>> On 8/16/18 2:32 PM, Aaron D. Trout wrote:
>>> [...]
>>> On Thursday, 16 August 2018 at 17:20:23 UTC, Steven Schveighoffer wrote:
>>>>
>>>> Yes, this is the effect I would expect.
>>>>
>>>> D has traditionally simply allowed slicing stack data without 
>>>> question (even in @safe code), but that will change when dip1000 is 
>>>> fully realized. It will be allowed, but only when assigning to scope 
>>>> variables.
>>>>
>>>
>>> Thanks for the quick and knowledgeable reply! I think I understand 
>>> what's going on, but I'm surprised it is allowed in @safe code since 
>>> the compiler doesn't allow the following, even in non- at safe code:
>>>
>>> int[] badSlice()
>>> {
>>>      int[2] buffer;
>>>      return buffer[];
>>> }
>>
>> It's because it's on the same line. This is a crude "safe" feature 
>> that is easily duped.
>>
>> This is allowed to compile:
>>
>> int[2] buffer;
>> auto buf = buffer[];
>> return buf;
>>
>> But add -dip1000 to the dmd options and that fails.
>>
>> I would warn you that I think dip1000 is too crude to start trying to 
>> apply it to your project, and may have linker errors with Phobos.
>>
>>> I guess the compiler just isn't (yet!) able to catch that the 
>>> associative array is storing a slice of expired stack. I'm surprised 
>>> that the built-in AA implementation *allows* using slices as keys in 
>>> @safe code without copying the underlying data to the heap first. 
>>> This is clearly dangerous, but perhaps heap-copying slices 
>>> defensively would result in an unacceptable performance hit.
>>
>> I wouldn't put too much stock in having safety in the AA. The AA is a 
>> very very old piece of the compiler, that pre-dates safety checks, and 
>> still is a bit of a kludge in terms of type and memory safety. If you 
>> do find any obvious bugs, it's good to report them.
>>
>>> This issue came up while trying to eliminate unnecessary allocation 
>>> in my code. In my case, I could set a maximum key length at compile 
>>> time and switch my key type to a struct wrapping a static array buffer.
>>>
>>> In hindsight, it was silly for me to think I could eliminate 
>>> separately allocating the keys when the key type was a variable 
>>> length array, since the AA must store the keys. That said, a suitable 
>>> admonition from the compiler here would have been very educational. I 
>>> look forward to seeing the full inclusion of DIP1000!
>>
>> In this case, actually, the AA does NOT store the key data, but just 
>> the reference to the keys. An array slice is a pointer and length, and 
>> the data is stored elsewhere. The static version, however, does store 
>> all the key data inside the AA.
>>
>> That being said, you can potentially avoid more allocation with the 
>> keys with various tricks, such as pre-allocating all the keys and then 
>> using the reference.
>>
>> In other words, eagerly stick the data into an array of arrays:
>>
>>     auto sets = setA.map!(j => setB.filter!(i => i % j == 
>> 0).array).array;
>>
>> and then not worry about duping them. But it all depends on your use 
>> case.
>>
> 
> Thanks again for the quick reply! I have a pretty firm grasp on what a 
> slice is (pointer + offset).

pointer + length, but maybe that's what you meant.

> What I had meant by the comment "the AA must 
> store the keys" was that I had somehow gotten the (of course totally 
> mistaken!) idea that the AA only ever needed to *examine* the key rather 
> than actually storing it.

Right, the hash only gets you to a bucket, you still need the actual 
value to compare for equality.

> If that were the case, a slice of 
> about-to-be-expired stack would be perfectly fair game as a key. Am I 
> correct that doing this *would* be an OK way to avoid unnecessary 
> allocation if we knew the key already existed (as a heap allocated 
> slice) in the AA and we simply wanted to modify the associated value? 

Yes, definitely! There have been a few new functions added to AAs 
recently to help with only allocating *values* when not present, but not 
a way to do the same with keys.

What you *can* do (but this involves 2 lookups) is:

int[2] buf = ...;

if (auto valptr = buf[] in aa)
{
    // use *valptr to get the value
}
else
{
    aa[buf.idup] = 0; // initial value
}

I don't think the storage of the key was considered when adding the new 
functions (`require` and `update`).

> Thanks also for the advice about -dip1000 and the state of the built-in 
> AA implementation. My code base has been changing to include more 
> AA-heavy data structures, so I think that in the near future I will need 
> to do some refactoring to make changing AA implementation easier.

I maybe said it more strongly than needed; AAs are generally safe, it's 
just that I'm not surprised if there are holes. It's a type that the 
compiler generally ignores a lot of rules for, and not everything is 
covered. However, in this case, it was the slicing that was unsafe, the 
AA had nothing to do with it.

> Also, one last question: should this issue be reported as a new bug? My 
> understanding was that @safe code should not allow obtaining references 
> to expired stack memory, but perhaps this is already a known problem? 
> I'm happy to file a new bug report if that would be helpful!

No, it's an old bug:

https://issues.dlang.org/show_bug.cgi?id=8838

Closed as fixed since dip1000 fixes it.

What you could do is try to get your code to work with dip1000 and then 
if you can't, file a bug against *that*. But this may be something that 
isn't going to be easy to do, or may take more time than it's worth.

-Steve