New linked list

Fri May 12 13:35:36 PDT 2006

Walter Bright wrote:
> Sean Kelly wrote:
>> *sigh*  I can see the reason for this, but it dramatically reduces the 
>> utility of foreach for me, as it's a very common idiom to want to 
>> remove elements during an iteration.  In fact, I already use this 
>> technique quite a bit in Thread and ThreadGroup.  That aside, does 
>> this restriction also exist for processing AA elements?
> 
> Yes - everything used as an aggregate in a foreach.
> 
>> I ask because this is done on a dynamically allocated list of 
>> references rather than against the AA itself, though I suppose that's 
>> implementation defined.
> 
> The idea is to enable the compiler to do aggressive loop optimizations, 
> regardless of the aggregate type used, and regardless of whether it is 
> user-defined or built-in.

To play the Devil's advocate... doesn't this imply that the body of an 
opApply function is also not allowed to make any modifications to the 
underlying data structure?  As a contrived example, suppose I have a 
SqlCursor object that performs lazy reads and caches this data once read 
so the read must only occur once, and that this data is stored in an 
SList which is used for the actual iteration.  From a code optimization 
standpoint this seems entirely reasonable.  However, it does seem to 
violate the general rule you've outlined for what is legal in foreach loops.

You've said that the point of foreach is to allow the compiler to treat 
the aggregate as a loop invariant, but my understanding of this term 
implies that only optimizations on the referent of v may be optimized. 
For example:

     int[] v;
     ...
     foreach( int i; v )
     {
         ...
     }

may be transformed into:

     t1 = v.ptr;
     t2 = v.len;
     t3 = 0;
L1: if( t2 <= t3 ) goto L2;
     ...
     t3 += 1;
     goto L1;
L2: ...

This would explain Derek's experience where sequence modifications were 
not observed within the loop.  However, without a viable 'const' 
signifier, I don't think any optimization regarding the sequence itself 
may be performed:

     t1 = v.ptr;
     t2 = v.len;
     t3 = 0;
     if( t2 <= 0  ) goto L2;
     t4   = alloca( t2 )
     t4[] = t1[0 .. t2]
L1: if( t2 <= t3 ) goto L2;
     ...
     t3 += 1;
     goto L1;
L2: ...

Is this correct?  The first case would explain why rehashing an AA 
within a foreach loop would likely result in undefined behavior, as 
would an append-type operation on dynamic arrays.  But more structurally 
cohesive data structures (ie. those for which modifications don't result 
in iterator invalidation in C++: map, list, etc) should be fairly 
compatible with foreach even in the face of such changes regardless of 
whether such detailed rules would be advisable within a language spec.

Regarding the 'const' label.  The only way I can see such an idea 
working in a way that supports compiler optimizations is to make it a 
dynamic storage attribute of sorts.  That is, the const-ness must be 
applied to the data, not to the reference.  I simply can't think of any 
option where the traditional approach would work:

     char[]       ptr = "";
     const char[] ref = ptr;
     ...
     foreach( char c; ref )
     {
         ...
     }

If ptr is ever passed by reference to an opaque function then there is 
no way to determine whether the data referenced by ref may change within 
the life of the program.  In fact, the same could be said if ref were 
ever passed by reference to an opaque function implemented in any 
language other than D, as one must assume that the const attribute could 
be ignored.  Frankly, this has me wondering whether it's possible to 
enforce any sort of const behavior without data movement in the face of 
aliasing.  For example, in instances where const data is being passed to 
an opaque code segment it should be possible to instead make a copy in a 
write-protected memory page and pass a reference to that instead, 
similar to how passing structs by value to functions works now.  This 
would result in a run-time error if a modification occurred to such 
data, similar to what happens for modifications to const data in D now. 
  But as far as I know it isn't possible to mark arbitrary memory ranges 
as read-only.

In the long-term, if transactional memory ever becomes a reality then it 
may be possible to exploit this as a lightweight way of enforcing 
const-ness: cache the data somewhere and if the cache is ever 
invalidated then signal an error.  But this may not be an option for 
const data that has a long lifetime.

Sean