Possible bug in associative array implementation (and/or @safe checking)
Steven Schveighoffer
schveiguy at gmail.com
Fri Aug 17 13:14:58 UTC 2018
On 8/16/18 4:45 PM, Aaron D. Trout wrote:
> On Thursday, 16 August 2018 at 18:56:45 UTC, Steven Schveighoffer wrote:
>> On 8/16/18 2:32 PM, Aaron D. Trout wrote:
>>> [...]
>>> On Thursday, 16 August 2018 at 17:20:23 UTC, Steven Schveighoffer wrote:
>>>>
>>>> Yes, this is the effect I would expect.
>>>>
>>>> D has traditionally simply allowed slicing stack data without
>>>> question (even in @safe code), but that will change when dip1000 is
>>>> fully realized. It will be allowed, but only when assigning to scope
>>>> variables.
>>>>
>>>
>>> Thanks for the quick and knowledgeable reply! I think I understand
>>> what's going on, but I'm surprised it is allowed in @safe code since
>>> the compiler doesn't allow the following, even in non- at safe code:
>>>
>>> int[] badSlice()
>>> {
>>> int[2] buffer;
>>> return buffer[];
>>> }
>>
>> It's because it's on the same line. This is a crude "safe" feature
>> that is easily duped.
>>
>> This is allowed to compile:
>>
>> int[2] buffer;
>> auto buf = buffer[];
>> return buf;
>>
>> But add -dip1000 to the dmd options and that fails.
>>
>> I would warn you that I think dip1000 is too crude to start trying to
>> apply it to your project, and may have linker errors with Phobos.
>>
>>> I guess the compiler just isn't (yet!) able to catch that the
>>> associative array is storing a slice of expired stack. I'm surprised
>>> that the built-in AA implementation *allows* using slices as keys in
>>> @safe code without copying the underlying data to the heap first.
>>> This is clearly dangerous, but perhaps heap-copying slices
>>> defensively would result in an unacceptable performance hit.
>>
>> I wouldn't put too much stock in having safety in the AA. The AA is a
>> very very old piece of the compiler, that pre-dates safety checks, and
>> still is a bit of a kludge in terms of type and memory safety. If you
>> do find any obvious bugs, it's good to report them.
>>
>>> This issue came up while trying to eliminate unnecessary allocation
>>> in my code. In my case, I could set a maximum key length at compile
>>> time and switch my key type to a struct wrapping a static array buffer.
>>>
>>> In hindsight, it was silly for me to think I could eliminate
>>> separately allocating the keys when the key type was a variable
>>> length array, since the AA must store the keys. That said, a suitable
>>> admonition from the compiler here would have been very educational. I
>>> look forward to seeing the full inclusion of DIP1000!
>>
>> In this case, actually, the AA does NOT store the key data, but just
>> the reference to the keys. An array slice is a pointer and length, and
>> the data is stored elsewhere. The static version, however, does store
>> all the key data inside the AA.
>>
>> That being said, you can potentially avoid more allocation with the
>> keys with various tricks, such as pre-allocating all the keys and then
>> using the reference.
>>
>> In other words, eagerly stick the data into an array of arrays:
>>
>> auto sets = setA.map!(j => setB.filter!(i => i % j ==
>> 0).array).array;
>>
>> and then not worry about duping them. But it all depends on your use
>> case.
>>
>
> Thanks again for the quick reply! I have a pretty firm grasp on what a
> slice is (pointer + offset).
pointer + length, but maybe that's what you meant.
> What I had meant by the comment "the AA must
> store the keys" was that I had somehow gotten the (of course totally
> mistaken!) idea that the AA only ever needed to *examine* the key rather
> than actually storing it.
Right, the hash only gets you to a bucket, you still need the actual
value to compare for equality.
> If that were the case, a slice of
> about-to-be-expired stack would be perfectly fair game as a key. Am I
> correct that doing this *would* be an OK way to avoid unnecessary
> allocation if we knew the key already existed (as a heap allocated
> slice) in the AA and we simply wanted to modify the associated value?
Yes, definitely! There have been a few new functions added to AAs
recently to help with only allocating *values* when not present, but not
a way to do the same with keys.
What you *can* do (but this involves 2 lookups) is:
int[2] buf = ...;
if (auto valptr = buf[] in aa)
{
// use *valptr to get the value
}
else
{
aa[buf.idup] = 0; // initial value
}
I don't think the storage of the key was considered when adding the new
functions (`require` and `update`).
> Thanks also for the advice about -dip1000 and the state of the built-in
> AA implementation. My code base has been changing to include more
> AA-heavy data structures, so I think that in the near future I will need
> to do some refactoring to make changing AA implementation easier.
I maybe said it more strongly than needed; AAs are generally safe, it's
just that I'm not surprised if there are holes. It's a type that the
compiler generally ignores a lot of rules for, and not everything is
covered. However, in this case, it was the slicing that was unsafe, the
AA had nothing to do with it.
> Also, one last question: should this issue be reported as a new bug? My
> understanding was that @safe code should not allow obtaining references
> to expired stack memory, but perhaps this is already a known problem?
> I'm happy to file a new bug report if that would be helpful!
No, it's an old bug:
https://issues.dlang.org/show_bug.cgi?id=8838
Closed as fixed since dip1000 fixes it.
What you could do is try to get your code to work with dip1000 and then
if you can't, file a bug against *that*. But this may be something that
isn't going to be easy to do, or may take more time than it's worth.
-Steve
More information about the Digitalmars-d-learn
mailing list