Newbie initial comments on D language - scope

Mon Feb 4 19:43:21 PST 2008

Michel Fortin wrote:
> On 2008-02-03 10:42:03 -0500, Edward Diener 
> <eddielee_no_spam_here at tropicsoft.com> said:
> 
>> Michel Fortin wrote:
>>> On 2008-02-03 08:20:32 -0500, Edward Diener 
>>> <eddielee_no_spam_here at tropicsoft.com> said:
>>>
>>>> I am fully cognizant of a dynamically typed language since I program 
>>>> in Python also. I agree there is no fixed dividing line. But the 
>>>> difference between static typing and dynamic typing is well defined 
>>>> in a statically typed language like D. My argument was that for 
>>>> 'scope' to be really effective it needs to consider the dynamic type 
>>>> at run-time and not just the static type as it exist at compile time.
>>>
>>> Considering the dynamic type at runtime means you need to check if 
>>> you're dealing with a reference-counted object each time you copy a 
>>> reference to that object to see if it the reference count needs 
>>> adjusting. This is significant overhead over the "just copy the 
>>> pointer" thing you can do in a GC. Basically, just checking this will 
>>> increase by two or three times the time it take to copy an object 
>>> reference... I can see why Walter doesn't want that.
>>
>> I am not knowledgable about the actual low-level difference between 
>> the compiler statically checking the type of an object or dynamically 
>> checking the type of an object, and the run-time costs involved.
>>
>> Yet clearly D already has to implement code when scopes come to an end 
>> in order to destroy stack-based objects, since structs ( user-define 
>> value types ) are already supported and can have destructors.
> 
> Yes, and this is implemented in a simple and naive way: by adding an 
> explicit call to the destructor at the end of the scope. The scope 
> object cannot exist outside the scope, and thus no reference counting is 
> needed in the way it's implemented currently.

The reference counting would only be implemented for a 'scope' object 
only. The main overhead at the end of each scope is going through all 
the objects to determine which is a 'scope' object. Perhaps this is too 
expensive, but it would at least be interesting to see if it is or not.

> 
>> So the added overhead goes from having to identify structs which must 
>> have their destructor called at the end of each scope to having to 
>> also identify 'scope' objects which must have their reference count 
>> decremented at the end of each scope and have their destructor called 
>> if the reference count reaches 0.
> 
> Well, identifying structs can be done at compile time since you know 
> exactly the type of the struct at that time. Classes are polymorphic, so 
> it'd be a costly runtime check to know that, and that check is almost as 
> costly as doing the reference counting itself. Given that, you should 
> probably not bother at runtime and decide at compile time to just treat 
> any class which has the potential to be a scope class as if it were one 
> and actually do the reference counting.

Your point is well taken, but I still would like to see if the check for 
a 'scope' object would be that expensive. It could be as easy as 
checking an extra 'int' for reference counting for each object and 
seeing whether it is 0 ( normal GC object ) or not 0 ( 'scope' object ).

>>
>>
>>>
>>> Beside, the overhead of actually checking the type of the class will 
>>> be approximativly the same as doing the reference counting. Given 
>>> this, it's much better to always just do the reference counting than 
>>> checking dynamically if it's needed.
>>>
>>>
>>>> class C { ... }
>>>> scope class D : C { ... }
>>>>
>>>> [...]
>>>>
>>>> This may make things much easier for the compiler, but it requires 
>>>> the end user knowledge of 'scope', which has been specified at the 
>>>> class level, to be applied at the syntax level. Intuitively I feel 
>>>> the compiler can figure this out, and that 'scope' should largely be 
>>>> totally transparent to the end user above at the syntax level.
>>>
>>> Well, if the compiler is to be able to distinguish scope at compile 
>>> time, then it needs a scope flag (either explicit or implicit) on 
>>> each variable. This is exactly what Walter has proposed to do. He 
>>> prefers the explicit route because going implicit isn't going to work 
>>> in too many cases. For instance, let's have a function that returns a C:
>>>
>>>     C makeOne() {
>>>         if (/* random stuff here */)
>>>             return new C;
>>>         else
>>>             return new D;
>>>     }
>>>
>>> Now let's call the function:
>>>
>>>     C c = makeOne();
>>>
>>> How can you know at compile time if the returned object of that 
>>> function call is scoped or not? You can't, and therfore the compiler 
>>> would need to add code to check if the returned object is scope or 
>>> not, with a significant overhead, each time you assign a C.
>>>
>>> If however you make scope known at compile time:
>>>
>>>     scope C makeOne() {
>>>         if (/* random stuff here */)
>>>             return new C;
>>>         else
>>>             return new D;
>>>     }
>>>
>>>     scope C c = makeOne();
>>>
>>> Now the compiler knows it must generate reference counting code for 
>>> the following assignment, and any subsequent assignment of this type, 
>>> and it won't have to generate code to dynamically everywhere you use 
>>> a C check the "scopeness".
>>
>> Would you agree that all you are doing here is specifically telling 
>> the compiler that an object is 'scope' when it is created rather than 
>> having the compiler figure it out for itself by querying the dynamic 
>> type of the object at creation time ?
> 
> The compiler isn't knowleadgeable of what happens whithin every function 
> call. So it can only check at runtime if the function returned at C or a D.

Fully agreed.

> 
>> If you do, then a much simpler, and to the point, example would be 
>> based on my initial OP:
>>
>> scope class C { ... }
>>
>> scope C c = new C(...);
>>
>> I specified that the scope keyword for creating the object is 
>> redundant. The compiler can figure it out. The major difference in 
>> opinion is that I think the compiler should figure it out from the 
>> dynamic type of the object at run-time and not from the static type of 
>> the object.
> 
> You're prefectly right: it is redundent in *this* case, and you could 
> have the compiler implicitly understand that C is a scope class in 
> *this* case. But consider this example:
> 
>     Object o;
>     if (/* random value */)
>         o = new C; // c is a scope class
>     else
>         o = new Object; // Object is the base class of C but isn't scope
> 
> Now, should o be automatically reference-counted because you *could* 
> later create a C object and assing it to o, or should line 3 gives an 
> error since the type Object isn't scope and C must only be assigned as 
> scope? I'd say it should be an error.

I say it should be a 'scope' object. The dynamic type of o is that of a 
'scope' class.

> 
> This however could be made legal without too much difficulty:
> 
>     scope Object o;
>     if (/* random value */)
>         o = new C; // c is a scope class
>     else
>         o = new Object; // Object is the base class of C but isn't scope
> 
> Basically, you're declaring a scope Object. While Object isn't 
> necessarly a scope class, you are telling the compiler to treat it as 
> scope, and thus an instance of C, which must be scope, *can* be put in 
> this variable. If o wasn't scope, it'd be an error to put an instance of 
> a scope class in it.

But then the end-user is required to know that the C is a scope class. I 
do not think that should be necessary.

The whole point of 'scope' ( RAII ) in GC is that, for the most part, an 
end-user should instantiate and use 'scope' classes just as he would 
normal GC classes, with the language taking care to automatically 
destruct an object of a 'scope' class just as soon as the last reference 
to that object goes out of scope.

> 
> But there are still many holes in this scheme in which scope now means 
> reference-counted. Take this example:
> 
>     class A {
>         void doSomething() {
>             globalReferences ~= this;
>         }
>     }
>     scope class B { }
> 
>     A[] globalReferences;
> 
>     scope B b = new B; // Scope could be made implicit here, but it's 
> irrelevant to my example
>     b.doSomething();
> 
> This last statement would call A.doSomething which would put a 
> non-scoped reference to globalReferences, which would fail to retain the 
> object. There are two ways around that: ignore the problem and let the 
> programmer handle these cases (basically, that is what boost::shared_ptr 
> would do in such a situation), or introduce a new keyword to decorate 
> parameters for functions that do not keep any reference beyound their 
> own call so that you don't need to duplicate all your functions for a 
> scope and non-scope parameter (much like const is the middle ground 
> between mutable and invariant).

No, A.doSomething would put a 'scoped' reference in a non-scope array. 
However if we specify 'scope A[] globalReferences;' we can solve that 
problem.

Of course we may not control the declaration of 'A[] globalReferences;'. 
I acknowledge that.