Newbie initial comments on D language - scope

Tue Feb 5 20:45:42 PST 2008

Michel Fortin wrote:
> On 2008-02-04 22:43:21 -0500, Edward Diener 
> <eddielee_no_spam_here at tropicsoft.com> said:
> 
>> Michel Fortin wrote:
>>> On 2008-02-03 10:42:03 -0500, Edward Diener 
>>> <eddielee_no_spam_here at tropicsoft.com> said:
>>>
>>>> Michel Fortin wrote:
>>>>> On 2008-02-03 08:20:32 -0500, Edward Diener 
>>>>> <eddielee_no_spam_here at tropicsoft.com> said:
>>>>>
>>>>>> I am fully cognizant of a dynamically typed language since I 
>>>>>> program in Python also. I agree there is no fixed dividing line. 
>>>>>> But the difference between static typing and dynamic typing is 
>>>>>> well defined in a statically typed language like D. My argument 
>>>>>> was that for 'scope' to be really effective it needs to consider 
>>>>>> the dynamic type at run-time and not just the static type as it 
>>>>>> exist at compile time.
>>>>>
>>>>> Considering the dynamic type at runtime means you need to check if 
>>>>> you're dealing with a reference-counted object each time you copy a 
>>>>> reference to that object to see if it the reference count needs 
>>>>> adjusting. This is significant overhead over the "just copy the 
>>>>> pointer" thing you can do in a GC. Basically, just checking this 
>>>>> will increase by two or three times the time it take to copy an 
>>>>> object reference... I can see why Walter doesn't want that.
>>>>
>>>> I am not knowledgable about the actual low-level difference between 
>>>> the compiler statically checking the type of an object or 
>>>> dynamically checking the type of an object, and the run-time costs 
>>>> involved.
>>>>
>>>> Yet clearly D already has to implement code when scopes come to an 
>>>> end in order to destroy stack-based objects, since structs ( 
>>>> user-define value types ) are already supported and can have 
>>>> destructors.
>>>
>>> Yes, and this is implemented in a simple and naive way: by adding an 
>>> explicit call to the destructor at the end of the scope. The scope 
>>> object cannot exist outside the scope, and thus no reference counting 
>>> is needed in the way it's implemented currently.
>>
>> The reference counting would only be implemented for a 'scope' object 
>> only. The main overhead at the end of each scope is going through all 
>> the objects to determine which is a 'scope' object. Perhaps this is 
>> too expensive, but it would at least be interesting to see if it is or 
>> not.
>>
>>>
>>>> So the added overhead goes from having to identify structs which 
>>>> must have their destructor called at the end of each scope to having 
>>>> to also identify 'scope' objects which must have their reference 
>>>> count decremented at the end of each scope and have their destructor 
>>>> called if the reference count reaches 0.
>>>
>>> Well, identifying structs can be done at compile time since you know 
>>> exactly the type of the struct at that time. Classes are polymorphic, 
>>> so it'd be a costly runtime check to know that, and that check is 
>>> almost as costly as doing the reference counting itself. Given that, 
>>> you should probably not bother at runtime and decide at compile time 
>>> to just treat any class which has the potential to be a scope class 
>>> as if it were one and actually do the reference counting.
>>
>> Your point is well taken, but I still would like to see if the check 
>> for a 'scope' object would be that expensive. It could be as easy as 
>> checking an extra 'int' for reference counting for each object and 
>> seeing whether it is 0 ( normal GC object ) or not 0 ( 'scope' object ).
> 
> Basically, you need to:
> 
> 1. Load the object's pointer in a register
> 2. Load the "scope" flag from memory by offseting the object's pointer
> 3. Branch depending on that flag:
>   a. if not scope, go to 4.
>   b. if scope, do whatever is needed to increment the reference count 
> atomically, then go to 4
> 4. Write the pointer to its new location.
> 
> That's a lot of extra work you'd have to do at every copy of an object's 
> pointer to perform that check. That branch operation could become very 
> expensive if the processor can't predict it right, and loading from an 
> additional, possibly far away, memory block could mean missing the 
> memory cache more often too.
> 
> 1 and 4 is all you need if you don't care about scope.

I love it when people such as you carry on about all the work that must 
be done to implement X. Implementing any new feature in any language 
takes work. There are NO free rides. But that never means that the new 
feature should not be done. Who cares if some program is slowed down by 
some number of microsecoonds each time if the feature makes a better and 
much easier programming paradigm work which otherwise could only be 
handled in a clumsy and inefficient manner.

> 
> 
>>> The compiler isn't knowleadgeable of what happens whithin every 
>>> function call. So it can only check at runtime if the function 
>>> returned at C or a D.
>>
>> Fully agreed.
>>
>>>
>>>> If you do, then a much simpler, and to the point, example would be 
>>>> based on my initial OP:
>>>>
>>>> scope class C { ... }
>>>>
>>>> scope C c = new C(...);
>>>>
>>>> I specified that the scope keyword for creating the object is 
>>>> redundant. The compiler can figure it out. The major difference in 
>>>> opinion is that I think the compiler should figure it out from the 
>>>> dynamic type of the object at run-time and not from the static type 
>>>> of the object.
>>>
>>> You're prefectly right: it is redundent in *this* case, and you could 
>>> have the compiler implicitly understand that C is a scope class in 
>>> *this* case. But consider this example:
>>>
>>>     Object o;
>>>     if (/* random value */)
>>>         o = new C; // c is a scope class
>>>     else
>>>         o = new Object; // Object is the base class of C but isn't scope
>>>
>>> Now, should o be automatically reference-counted because you *could* 
>>> later create a C object and assing it to o, or should line 3 gives an 
>>> error since the type Object isn't scope and C must only be assigned 
>>> as scope? I'd say it should be an error.
>>
>> I say it should be a 'scope' object. The dynamic type of o is that of 
>> a 'scope' class.
> 
> Hum, dynamic scope typing again? If you had that it'd work, sure, but 
> since we surely won't have that this isn't an option.

A brilliant conclusion. You decide that "we surely won't have that" so 
it will not work. Another candidate for a course in predicate logic 101 
shows up.

> 
> 
>>> This however could be made legal without too much difficulty:
>>>
>>>     scope Object o;
>>>     if (/* random value */)
>>>         o = new C; // c is a scope class
>>>     else
>>>         o = new Object; // Object is the base class of C but isn't scope
>>>
>>> Basically, you're declaring a scope Object. While Object isn't 
>>> necessarly a scope class, you are telling the compiler to treat it as 
>>> scope, and thus an instance of C, which must be scope, *can* be put 
>>> in this variable. If o wasn't scope, it'd be an error to put an 
>>> instance of a scope class in it.
>>
>> But then the end-user is required to know that the C is a scope class. 
>> I do not think that should be necessary.
> 
> Perhaps not, I don't have a strong opinion on that. But I firmly belive 
> scope should be enforced statically, not dynamically, and that's what 
> I'm arguing for.

I understand your argument based on the simplicity of the solution, and 
the relative speed of the code compared to the alternative of 
determining 'scope' at run-time. I respect your argument but I think 
that it is an incomplete solution from the end-user's perspective 
because he must be aware of the 'scope'-ness of the objects he uses and 
notate the objects accordingly. I think this is an imposition although I 
could live with it. But I would like to see the dynamic solution at 
least attempted.

> 
>> The whole point of 'scope' ( RAII ) in GC is that, for the most part, 
>> an end-user should instantiate and use 'scope' classes just as he 
>> would normal GC classes, with the language taking care to 
>> automatically destruct an object of a 'scope' class just as soon as 
>> the last reference to that object goes out of scope.
> 
> Well, perhaps there's a solution that would do what you want while still 
> keeping it compile-time only. It's some sort of compromise. Take these 
> three classes:
> 
>     class A {}
>     scope class B : A {}
>     scope class C : B {}
> 
> B and C are scope, A isn't. Now, what if writing "B" was equivalent to 
> writing "scope B" (since B is scope) and "C" was equivalent to writing 
> "scope C". Obviously, writing "A" wouldn't be equivalent to "scope A" 
> (because A is not scope). Then you could have:
> 
>     A a1 = new A;
>     A a2 = new B; // illegal: B is scope, cannot be assigned to non-scope A
>     scope A a3 = new B; // legal: B is scope and scope A is (explicitly) 
> scope
> 
>     B b1 = new B;
>     B b2 = new C; // legal: C is scope and B is (implicitly) scope
>     scope B3 = new C; // same as above
> 
> That would mean that you'd only have to explictly write scope if you're 
> using the non-scope base class as a type to hold a reference to your 
> scope object.

Yes, I understand your example completely.

> 
> 
>>> But there are still many holes in this scheme in which scope now 
>>> means reference-counted. Take this example:
>>>
>>>     class A {
>>>         void doSomething() {
>>>             globalReferences ~= this;
>>>         }
>>>     }
>>>     scope class B { }
>>>
>>>     A[] globalReferences;
>>>
>>>     scope B b = new B; // Scope could be made implicit here, but it's 
>>> irrelevant to my example
>>>     b.doSomething();
>>>
>>> This last statement would call A.doSomething which would put a 
>>> non-scoped reference to globalReferences, which would fail to retain 
>>> the object. There are two ways around that: ignore the problem and 
>>> let the programmer handle these cases (basically, that is what 
>>> boost::shared_ptr would do in such a situation), or introduce a new 
>>> keyword to decorate parameters for functions that do not keep any 
>>> reference beyound their own call so that you don't need to duplicate 
>>> all your functions for a scope and non-scope parameter (much like 
>>> const is the middle ground between mutable and invariant).
>>
>> No, A.doSomething would put a 'scoped' reference in a non-scope array. 
>> However if we specify 'scope A[] globalReferences;' we can solve that 
>> problem.
> 
> Sure, you're solving the problem nicely. But how does the compiler finds 
> out there's a problem in the first place? It needs to know that the this 
> parameter is scope, and thus the member function should be decorated 
> scope (just like you'd do with invariant). So you'd need to duplicate 
> every member function so that it can be used either as scope or 
> non-scope, and that's not very interesting unless you can declare that 
> the function does not need to know if the paramater is typed scope or 
> not (just like const means you don't know if it's invariant or mutable).

I am lost about what you are saying above. Member functions have nothing 
to do with 'scope'.