Newbie initial comments on D language - scope

Mon Feb 4 21:46:13 PST 2008

On 2008-02-04 22:43:21 -0500, Edward Diener 
<eddielee_no_spam_here at tropicsoft.com> said:

> Michel Fortin wrote:
>> On 2008-02-03 10:42:03 -0500, Edward Diener 
>> <eddielee_no_spam_here at tropicsoft.com> said:
>> 
>>> Michel Fortin wrote:
>>>> On 2008-02-03 08:20:32 -0500, Edward Diener 
>>>> <eddielee_no_spam_here at tropicsoft.com> said:
>>>> 
>>>>> I am fully cognizant of a dynamically typed language since I program in 
>>>>> Python also. I agree there is no fixed dividing line. But the 
>>>>> difference between static typing and dynamic typing is well defined in 
>>>>> a statically typed language like D. My argument was that for 'scope' to 
>>>>> be really effective it needs to consider the dynamic type at run-time 
>>>>> and not just the static type as it exist at compile time.
>>>> 
>>>> Considering the dynamic type at runtime means you need to check if 
>>>> you're dealing with a reference-counted object each time you copy a 
>>>> reference to that object to see if it the reference count needs 
>>>> adjusting. This is significant overhead over the "just copy the 
>>>> pointer" thing you can do in a GC. Basically, just checking this will 
>>>> increase by two or three times the time it take to copy an object 
>>>> reference... I can see why Walter doesn't want that.
>>> 
>>> I am not knowledgable about the actual low-level difference between the 
>>> compiler statically checking the type of an object or dynamically 
>>> checking the type of an object, and the run-time costs involved.
>>> 
>>> Yet clearly D already has to implement code when scopes come to an end 
>>> in order to destroy stack-based objects, since structs ( user-define 
>>> value types ) are already supported and can have destructors.
>> 
>> Yes, and this is implemented in a simple and naive way: by adding an 
>> explicit call to the destructor at the end of the scope. The scope 
>> object cannot exist outside the scope, and thus no reference counting 
>> is needed in the way it's implemented currently.
> 
> The reference counting would only be implemented for a 'scope' object 
> only. The main overhead at the end of each scope is going through all 
> the objects to determine which is a 'scope' object. Perhaps this is too 
> expensive, but it would at least be interesting to see if it is or not.
> 
>> 
>>> So the added overhead goes from having to identify structs which must 
>>> have their destructor called at the end of each scope to having to also 
>>> identify 'scope' objects which must have their reference count 
>>> decremented at the end of each scope and have their destructor called 
>>> if the reference count reaches 0.
>> 
>> Well, identifying structs can be done at compile time since you know 
>> exactly the type of the struct at that time. Classes are polymorphic, 
>> so it'd be a costly runtime check to know that, and that check is 
>> almost as costly as doing the reference counting itself. Given that, 
>> you should probably not bother at runtime and decide at compile time to 
>> just treat any class which has the potential to be a scope class as if 
>> it were one and actually do the reference counting.
> 
> Your point is well taken, but I still would like to see if the check 
> for a 'scope' object would be that expensive. It could be as easy as 
> checking an extra 'int' for reference counting for each object and 
> seeing whether it is 0 ( normal GC object ) or not 0 ( 'scope' object ).

Basically, you need to:

1. Load the object's pointer in a register
2. Load the "scope" flag from memory by offseting the object's pointer
3. Branch depending on that flag:
   a. if not scope, go to 4.
   b. if scope, do whatever is needed to increment the reference count 
atomically, then go to 4
4. Write the pointer to its new location.

That's a lot of extra work you'd have to do at every copy of an 
object's pointer to perform that check. That branch operation could 
become very expensive if the processor can't predict it right, and 
loading from an additional, possibly far away, memory block could mean 
missing the memory cache more often too.

1 and 4 is all you need if you don't care about scope.

>> The compiler isn't knowleadgeable of what happens whithin every 
>> function call. So it can only check at runtime if the function returned 
>> at C or a D.
> 
> Fully agreed.
> 
>> 
>>> If you do, then a much simpler, and to the point, example would be 
>>> based on my initial OP:
>>> 
>>> scope class C { ... }
>>> 
>>> scope C c = new C(...);
>>> 
>>> I specified that the scope keyword for creating the object is 
>>> redundant. The compiler can figure it out. The major difference in 
>>> opinion is that I think the compiler should figure it out from the 
>>> dynamic type of the object at run-time and not from the static type of 
>>> the object.
>> 
>> You're prefectly right: it is redundent in *this* case, and you could 
>> have the compiler implicitly understand that C is a scope class in 
>> *this* case. But consider this example:
>> 
>>     Object o;
>>     if (/* random value */)
>>         o = new C; // c is a scope class
>>     else
>>         o = new Object; // Object is the base class of C but isn't scope
>> 
>> Now, should o be automatically reference-counted because you *could* 
>> later create a C object and assing it to o, or should line 3 gives an 
>> error since the type Object isn't scope and C must only be assigned as 
>> scope? I'd say it should be an error.
> 
> I say it should be a 'scope' object. The dynamic type of o is that of a 
> 'scope' class.

Hum, dynamic scope typing again? If you had that it'd work, sure, but 
since we surely won't have that this isn't an option.

>> This however could be made legal without too much difficulty:
>> 
>>     scope Object o;
>>     if (/* random value */)
>>         o = new C; // c is a scope class
>>     else
>>         o = new Object; // Object is the base class of C but isn't scope
>> 
>> Basically, you're declaring a scope Object. While Object isn't 
>> necessarly a scope class, you are telling the compiler to treat it as 
>> scope, and thus an instance of C, which must be scope, *can* be put in 
>> this variable. If o wasn't scope, it'd be an error to put an instance 
>> of a scope class in it.
> 
> But then the end-user is required to know that the C is a scope class. 
> I do not think that should be necessary.

Perhaps not, I don't have a strong opinion on that. But I firmly belive 
scope should be enforced statically, not dynamically, and that's what 
I'm arguing for.

> The whole point of 'scope' ( RAII ) in GC is that, for the most part, 
> an end-user should instantiate and use 'scope' classes just as he would 
> normal GC classes, with the language taking care to automatically 
> destruct an object of a 'scope' class just as soon as the last 
> reference to that object goes out of scope.

Well, perhaps there's a solution that would do what you want while 
still keeping it compile-time only. It's some sort of compromise. Take 
these three classes:

	class A {}
	scope class B : A {}
	scope class C : B {}

B and C are scope, A isn't. Now, what if writing "B" was equivalent to 
writing "scope B" (since B is scope) and "C" was equivalent to writing 
"scope C". Obviously, writing "A" wouldn't be equivalent to "scope A" 
(because A is not scope). Then you could have:

	A a1 = new A;
	A a2 = new B; // illegal: B is scope, cannot be assigned to non-scope A
	scope A a3 = new B; // legal: B is scope and scope A is (explicitly) scope

	B b1 = new B;
	B b2 = new C; // legal: C is scope and B is (implicitly) scope
	scope B3 = new C; // same as above

That would mean that you'd only have to explictly write scope if you're 
using the non-scope base class as a type to hold a reference to your 
scope object.

>> But there are still many holes in this scheme in which scope now means 
>> reference-counted. Take this example:
>> 
>>     class A {
>>         void doSomething() {
>>             globalReferences ~= this;
>>         }
>>     }
>>     scope class B { }
>> 
>>     A[] globalReferences;
>> 
>>     scope B b = new B; // Scope could be made implicit here, but it's 
>> irrelevant to my example
>>     b.doSomething();
>> 
>> This last statement would call A.doSomething which would put a 
>> non-scoped reference to globalReferences, which would fail to retain 
>> the object. There are two ways around that: ignore the problem and let 
>> the programmer handle these cases (basically, that is what 
>> boost::shared_ptr would do in such a situation), or introduce a new 
>> keyword to decorate parameters for functions that do not keep any 
>> reference beyound their own call so that you don't need to duplicate 
>> all your functions for a scope and non-scope parameter (much like const 
>> is the middle ground between mutable and invariant).
> 
> No, A.doSomething would put a 'scoped' reference in a non-scope array. 
> However if we specify 'scope A[] globalReferences;' we can solve that 
> problem.

Sure, you're solving the problem nicely. But how does the compiler 
finds out there's a problem in the first place? It needs to know that 
the this parameter is scope, and thus the member function should be 
decorated scope (just like you'd do with invariant). So you'd need to 
duplicate every member function so that it can be used either as scope 
or non-scope, and that's not very interesting unless you can declare 
that the function does not need to know if the paramater is typed scope 
or not (just like const means you don't know if it's invariant or 
mutable).

-- 
Michel Fortin
michel.fortin at michelf.com
http://michelf.com/