Newbie initial comments on D language - scope

Thu Feb 7 17:10:28 PST 2008

Walter Bright wrote:
> Edward Diener wrote:
>>> It will be required as any user could declare an object instance as 
>>> 'scope', and so any separately compiled code must anticipate that.
>> I agree in the sense that every object may need to carry an extra 
>> reference count with it even though it will not be used for the vast 
>> majority of objects, which will be GC. I do not view this as an issue.
> 
> It's a very serious issue, as it essentially negates much of the 
> advantage of general gc. For one example, you'll have to give up 
> interior pointers.

I do not follow what having a reference count for an object has to do 
with giving up interior pointers.

> 
>>> It's just that if any object could be scoped based on a runtime test, 
>>> that then you've got to insert that test at every assignment, copy 
>>> construction, and scope exit. You've got all the overhead of RC.
>> Yes, agreed. There will be overhead to deal with 'scope' objects. 
> 
> It will be needed for *every* gc object, too. And not just the 
> allocation for the reference count, the test has to be executed every time.

The test for a reference count is executed whenever you need to do 
something if the object is a 'scope' object which you would not do for a 
non-scoped object. Perhaps this is what you mean by "every time". I have 
these testing "times" as assignment/copy a reference and exiting a 
scope. When instantiating an object no "test" need be made since the 
compiler always knows when an object is 'scope' or not when it is 
created ( 'scope sometype someobject' notation or sometype has a 'scope 
class' notation).

> 
>> However you already have some overhead dealing with stack variables, 
>> and so has C++ for its existence at the end of each scope and it sure 
>> does not make C++ slower than most GC systems.
> 
> If reference counting worked that well, there would be no push to add gc 
> to C++0x.

No one ever said that reference counting solved all memory problems as 
opposed to GC. The most obvious usage for GC which I know, over and 
above reference counting, is cross-referenced objects.

> 
> 
>> I can not say too strongly that if RAII, via 'scope', is to work in D 
>> or any other GC language, the end-user should be as oblivious as 
>> possible to it working automatically. This means that class designer, 
>> who surely must know whether objects of their class need RAII, tells 
>> the compiler that his type is 'scope' and the end-user proceeds to use 
>> objects of that type just as if he would use normal GC objects.
>>
>> Otherwise you are creating a bifurcated system which does the end-user 
>> no good. Not only must the end user know something in advance about 
>> the inner workings of a class ( that it needs RAII ) when the class 
>> designer already knows it, but he must also use a separate notation to 
>> deal with objects of that class.
> 
> For those cases, all the class designer needs to do is present to the 
> user the struct wrapper for the class, not the class itself.

Sure, but then there becomes a different notation for dealing with 
specific classes, which nullifies the whole point of being able to 
specify an RAII type ( via 'scope class' in D ).

> 
> 
>>> Then you have the problem that all generated code that manipulates 
>>> any object must insert all the rc machinery for that object, just in 
>>> case some user somewhere instantiates it as 'scope'.
>>
>> It needs to have inserted for it the mechanism which determines 
>> whether that object is a 'scope' object or not. It probably needs the 
>> extra int for possible reference counting. Other than that I do not 
>> see what other machinery is needed for normal GC objects.
> 
> Consider:
> 
> void foo(C c) { C d = c; }
> 
> foo() has no idea if c is ref counted or gc. Therefore, it has to check 
> every time, at run time. All the machinery has to be there, just in case.

I agree.

> 
>> If we are really still in the age, with vtables and alignment padding 
>> and god knows what else a compiler writer needs per object to 
>> correctly do his work, where another 4 bytes of int is considered 
>> prohibitory, then I give up the whole idea <g>.
> 
> It's not just another 4 bytes.

I meant that memory-wise it is just 4 bytes. Of course it is extra 
programming from the language's point of view.

Let me try to make the case for RAII in D via 'scope' once again, by 
presenting the technical details as I see it, and then you will no doubt 
choose what you think best. If I am really far off please tell me about 
it, otherwise there is little reason for me to try to argue and present 
my idea further as you will do what you think best, and I appreciate 
that you have heard me out.

First, the situations when RAII processing occurs:

1) A 'scope' object is instantiated. The internal reference count, 
however you choose to implement it, is set to 1.

2) A 'scope' object's reference is assigned/copied to another object. If 
the 'scope' object is not a null reference, the reference count is 
incremented.

3) A 'scope' object's reference is changed through assignment. If the 
old reference is not a null reference, the old reference's reference 
count is decremented and if it is 0, the old object is destructed ( its 
destructor is called ) and its memory is released ( the latter may 
happen later through GC for all I know ).

4) A 'scope' object reaches the end of it's scope. Processing then 
occurs exactly as it does in 3).

There are two ways of dealing with the identification of a 'scope' object.

The first way is through its static type, where the compiler always 
knows the static type of an object and can generate the correct code in 
each of the 4 instances above for a 'scope' object, and ignore any 
changes to the way that normal non-scope objects are treated. This is 
the easiest way from the compiler's perspective and no doubt the 
fastest. There is no penalty for normal non-scope GC objects and only 
the 'scope' object undergoes special, slower processing. I still have 
hope that if you see fit to go this way that you will allow the user to 
identify a 'scope' object either by the 'scope' keyword applied to the 
instantiated object itself or by the 'scope' keyword applied to the 
class type of the object. I say that because I can not conceive of a 
compiler that could not figure out that an object was 'scope' because 
its class type was 'scope'.

The second way is by examining its dynamic type at run-time and 
generating code to take the appropriate action. This second way is 
harder for the compiler to do and no doubt slower, although how much 
slower is something which could only be pragmatically measured by you 
with D. With this second way, every object must be tested in each of the 
4 cases above to determine if it is a 'scope' object and to take the 
appropriate action if it is. Obviously 4) above is the potential killer 
as far as this goes because it would mean testing every reference at the 
end of each scope, just in case one or more of them is a 'scope' object 
and needs its end of scope processing. In the other three cases one is 
dealing with a single object in a well-defined, if general, situation so 
the overhead would be much less. This second way is obviously much 
better from the end-user's point of view, which does not mean it is 
practically a better solution by any means.

My only practical argument with all those who are certain that this 
second way would be an unnecessary imposition on all the users of normal 
GC objects, and want to regale me with code absolutely "proving" a 
priori their case, is that once an object is determined to be normal GC 
there is nothing further that needs be done for that object which would 
not have been done otherwise. Of course there is overhead for 
determining this in the cases above, especially with 4).

For this second way I have presented the extra reference count field, 
attached internally to all objects, as a way of determining if the 
object is 'scope' when doing 2), 3), or 4), with the proviso that when 
doing 1) the value for all normal GC objects of this field would be set 
to 0. If this is an entirely impractical solution, I am sure that if you 
decide to pursue the possibility of the second way, just to see if it 
can be done and what is the practical penalty in doing it, you will find 
a better scheme.