Possible change to array runtime?

Thu Mar 13 12:45:16 PDT 2014

On Thu, 13 Mar 2014 15:06:47 -0400, monarch_dodra <monarchdodra at gmail.com>  
wrote:

> On Thursday, 13 March 2014 at 18:09:55 UTC, Steven Schveighoffer wrote:
>> On Thu, 13 Mar 2014 13:44:01 -0400, monarch_dodra  
>> <monarchdodra at gmail.com> wrote:
>>
>>> On Thursday, 13 March 2014 at 16:17:17 UTC, Steven Schveighoffer wrote:
>>>> On Thu, 13 Mar 2014 11:53:15 -0400, monarch_dodra  
>>>> <monarchdodra at gmail.com> wrote:
>>>>> Please keep in mind that if the objects stored are RAII, then  
>>>>> if/when we will have a finalizing GC, the stomped elements will have  
>>>>> been leaked.
>>>>>
>>>>> Clobbering elements is more than just "I won't use these elements  
>>>>> anymore", it's "I won't use them, and they are safe to be discarded  
>>>>> of right now".
>>>>>
>>>>> In know that's a big "if", but it could happen. If we go the way of  
>>>>> your proposal, we are definitively closing that door.
>>>>
>>>> I'm not understanding this. Can you explain further/give example?
>>>
>>> Well, image "File": It's a struct that owns a "file handle", and when  
>>> the struct is destroyed, it releases/closes the file. Basic RAII.
>>>
>>> Now, imagine we want to open several files. We'd use:
>>> File[] files;
>>>
>>> "As of today", this does not end well, as the GC does not finalize the  
>>> array elements, and the file handles are leaked. We could hope, that  
>>> one day, the GC *will* finalize the array elements.
>>>
>>> However, with your proposal, imagine this code:
>>>
>>> File[] files;
>>> files ~= File("foo"); //Opens file "foo"
>>> files.length = 0;
>>> files ~= File("bar"); //Clobbers "foo"
>>>
>>> With this setup, the File handling "foo" gets clobbered, ruining any  
>>> chance of releasing it ever.
>>
>> Line 3 should be files = null. There is no point to setting length to  
>> 0, and mostly this is a relic from D1 code. That was my basis of why it  
>> shouldn't cause problems.
>
> well... "should"/"could". The problem (IMO) is taking perfectly valid  
> code, and making a subtle and silent change to it, changing its behavior  
> and potentially breaking it. It's really the most pernicious kind of  
> change.

Yes, it's perfectly valid. We could go through a deprecation process, but  
I don't think it's worth it.

I have also proposed a mechanism to detect these calls, which could be  
turned on if you think your code might be affected by this (see my  
response to Andrei).

All in all, it's not a very good excuse for breaking anyone's code,  
especially in a way that potentially can corrupt data. It's the whole  
reason we got rid of stomping in the first place.

>
>>> The "only way" to make it work (AFAIK), would be for "length = 0" to  
>>> first finalize the elements in the array. However, you do that, you  
>>> may accidentally destroy elements that are still "live" and referenced  
>>> by another array.
>>
>> In fact, assumeSafeAppend *should* finalize the elements in the array,  
>> if it had that capability. When you call assumeSafeAppend, you are  
>> telling the runtime that you are done with the extra elements.
>
> Good point. Very very good point. As a matter of fact, this could be  
> implemented right now, couldn't it?

Actually, I think it can. But, no other code in the array runtime  
finalizes array elements. I don't think that it would be good to make this  
inconsistent, even if it's the right thing to do.

>>> I'm not too hot about this proposal. My main gripe is that while  
>>> "length = 0" *may* mean "*I* want to discard this data", there is no  
>>> guarantee you don't have someone else that has a handle on said data,  
>>> and sure as hell doesn't want it clobbered.
>>
>> Again, the =null is a better solution. There is no point to keeping the  
>> same block in reference, but without any access to elements, unless you  
>> want to overwrite it.
>>
>> The problem is that we have this new mechanism that keeps those intact.  
>> If I could have imagined this outcome and was aware of this logic, I  
>> would have kept the length = 0 mechanics from D1 to begin with.
>
> I don't like the fact that this can only be implemented in the compiler.  
> Because we *are* talking about "literal" 0, right? Run-time 0 wouldn't  
> have this behavior?

Yes, runtime 0 would have this behavior, it would be in the runtime this  
would change.

> By that same token, I don't see why "0" would get such special  
> treatment, when I'm certain you'd find just as many instance of "length  
> = 1;", which means to say "keep the first element, and start clobering  
> from there".

Because it's nonsense to keep a reference to data that you have no access  
to. A reference to a 1-element array still has elements you can access,  
you can envision a reason to keep that without wanting to append to it.

In other words, the only sane reason to shrink it to 0 but keep the  
reference not null is to append into the same block again. You should  
never see a length = 0 call without an assumeSafeAppend afterwards.

>> The issue I'm examining is that people are reluctant to move off of D1,  
>> because their D1 code behaves well when they do length = 0, and it  
>> behaves badly in D2, even though it compiles and works correctly. They  
>> do not have assumeSafeAppend in D1. I'm unsure why they have not used  
>> it, either out of ignorance of the function or they have decided it's  
>> not worth the effort, I have no idea.
>
> Well... "s/some people/sociomantic/". How hard would it be to add it to  
> D1, to help the migration process? I know it's "closed", but there are  
> always exceptions.

This was actually asked of me, by the Tango team long ago. But they just  
wanted the cache and not the stomping fix.

The stomping adds overhead, which makes things slower. But it's mitigated  
by the fact that with thread-local storage embedded into the type, we can  
avoid taking the GC lock. This results in a speedup.

I doubt it would be worthwhile.

> Honestly, I'm 100% fine with braking "compile-time" changes. Behavioral  
> changes on the other hand...

Yes, it's not a good compromise. It rests solely on the expectation that  
any code which sets length to 0 should be calling assumeSafeAppend  
afterwards, or would expect the same functionality. However, it could  
possibly be the case that someone wrote that expecting the current  
behavior.

I think realistically, as correct as the proposal is, it's not likely to  
be accepted.

>>>> https://github.com/D-Programming-Language/druntime/pull/147
>>>>
>>>> reserve and capacity were made nothrow, not sure why assumeSafeAppend  
>>>> shouldn't also be.
>>>>
>>>
>>> The irony is that reserve can't actually be tagged pure nor nothrow:  
>>> Since it can cause relocation, it can call postblit, which in turn may  
>>> actually be impure or throw.
>>>
>>> assumeSafeAppend, on the other hand, is *certifiably* pure and  
>>> nothrow, since it never actually touches any data.
>>
>> It does touch data, but it should be pure and nothrow.
>
> Right. *BUT*, it is not safe. This means that any code that uses "length  
> = 0;" becomes unsafe.

First, I'll qualify that proposal is for this to happen only in specific  
circumstances, as outlined before.

But it's not actually un @safe, unless you are talking about immutable  
data (and this proposal specifically does not do that).

> At least I *think* it's unsafe, there is always a think line between  
> "get my hands dirty under the hood" and "memory safety".
>
> I'm pretty sure that with creative use of assumeSafeAppend, you could do  
> illegal memory access.

Not anything that you couldn't already do with an array element change.

-Steve