Repost: make foreach(i, a; range) "just work"

Thu Feb 20 03:15:13 PST 2014

I am posting this again because I didn't get any feedback on my idea,  
which may be TL;DR or because people think it's a dumb idea and they were  
politely ignoring it :p

My original thought was that things like this should "just work"..

auto range = input.byLine();
while(!range.empty)
{
   range.popFront();
   foreach (i, line; range.take(4))  //Error: cannot infer argument types
   {
     ..etc..
   }
   range.popFront();
}

The reason it fails was best expressed by Steven:

> This is only available using opApply style iteration. Using range  
> iteration does not give you this ability.
> It's not a permanent limitation per se, but there is no plan at the  
> moment to add multiple parameters torange iteration.
>
> One thing that IS a limitation though: we cannot overload on return  
> values. So the obvious idea ofoverloading front to return tuples of  
> various types, would not be feasible. opApply can do that becausethe  
> delegate is a parameter.

And Jakob pointed me to this proposed solution:
[1] https://github.com/D-Programming-Language/phobos/pull/1866

Which is a great idea, but, I still feel that this should "just work" as I  
have written it.  I think this is what people will intuitively expect to  
work, and having it fail and them scrabble around looking for enumerate is  
sub-optimal.  I think we can solve it without negatively impacting future  
plans like what bearophile wants, which is built-in tuples (allowing  
foreach over AA's etc).

So, the solution I propose for my original problem above is:

Currently the 'i' value in a foreach on an array is understood to be an  
index into the array.  But, ranges are not always indexable.  So, for us  
to make this work for all ranges we would have to agree to change the  
meaning of 'i' from being an "index" to being a "counter, which may also  
be an index".  This counter would be an index if the source object was  
indexable.  Another way to look at it is to realise that the counter is  
always an index into the result set itself, and could be used as such if  
you were to store the result set in an indexable object.

To implement this, foreach simply needs to keep a counter and increment it  
after each call to the foreach body - the same way (I assume) it does for  
arrays and objects with opApply.

Interestingly, if this had been in place earlier, then the byKey() and  
byValue() members of AA's would not have been necessary.  Instead  
keys/values could simply have changed into indexable ranges, and no code  
breakage would have occurred (AFAICS).

So, to address bearophile's desire for built-in tuples, and iteration over  
AA's and how this change might affect those plans.  It seems to me we  
could do foreach over AAs/tuples in one of 2 ways or even a combination of  
both:

Scheme 1) for AA's/tuples the value given to the foreach body is a  
voldemort (unnamed) type with a public property member for each component  
of the AA/tuple.  In the case of AA's this would then be "key" and  
"value", for tuples it might be a, b, .., z, aa, bb, .. and so on.

foreach(x; AA) {}        // Use x.key and x.value
foreach(i, x; AA) {}     // Use i, x.key and x.value
foreach(int i, x; AA) {} // Use i, x.key and x.value

Extra/better: For non-AA tuples we could allow the members to be named  
using some sort of syntax, i.e.

foreach(i, (x.bob, x.fred); AA) {} // Use i, x.bob and x.fred
or
foreach(i, x { int bob; string fred }; AA) {} // Use i, x.bob and x.fred
or
foreach(i, new x { int bob; string fred }; AA) {} // Use i, x.bob and  
x.fred

Lets look at bearophile's examples re-written for scheme #1

foreach (v; AA) {}
foreach (x; AA) { .. use x.value .. } // better? worse?

foreach (k, v; AA) {}
foreach (x; AA) { .. use x.key, x.value .. } // better? worse?

foreach (k; AA.byKeys) {}
same // no voldemort reqd

foreach (i, k; AA.byKeys.enumerate) {}
foreach (i, k; AA.byKeys) {}   // better. note, no voldemort reqd

foreach (i, v; AA.byValues.enumerate) {}
foreach (i, v; AA.byValues) {} // better. note, no voldemort reqd

foreach (k, v; AA.byPairs) {}
foreach (x; AA) { .. use x.key, x.value .. } // better

foreach (i, k, v; AA.byPairs.enumerate) {}
foreach (i, x; AA) { .. use i and x.key, x.value .. } // better

This is my preferred approach TBH, you might call it foreach on "packed"  
tuples.

Scheme 2) the tuple is unpacked into separate variables given in the  
foreach.

When no types are given, components are assigned to variables such that  
the rightmost is the last AA/tuple component and subsequent untyped  
variables get previous components up and until the N+1 th which gets  
index/count.

foreach (v; AA) {}        // v is "value" (last tuple component)
foreach (k, v; AA) {}     // k is "key"   (2nd to last tuple component),  
...
foreach (i, k, v; AA) {}  // i is "index/count" because AA only has 2  
tuple components.

So, if you have N tuple components and you supply N+1 variables you get  
the index/count.  Supplying any more would be an error.

However, if a type is given and the type can be unambiguously matched to a  
single tuple component then do so.

double[string] AA;
foreach (string k; AA) {} // k is "key"

.. in which case, any additional unmatched untyped or 'int' variable is  
assigned the index/count. e.g.

foreach (i, double v; AA) {} // i is index, v is "value"
foreach (i, string k; AA) {} // i is index, k is "key"

If more than one typed variable is given, match each unambiguously.

foreach (string k, double v; AA) {} // i is index, k is "key", v is "value"

.. and likewise any unmatched untyped or 'int' variable is assigned  
index/count. e.g.

foreach (i, string k, double v; AA) {}     // i is index, k is "key", v is  
"value"
foreach (int i, string k, double v; AA) {} // i is index, k is "key", v is  
"value"

Any ambiguous situation would result in an error requiring the use of one  
of .keys/values (in the case of an AA), or to specify types (where  
possible), or to specify them all in the tuple order, e.g.

Using a worst case of..
int[int] AA;

// Error: cannot infer binding of k; could be 'key' or 'value'
foreach (int k; AA) {}

// Solve using .keys/byKey()/values/byValue()
foreach (k; AA.byKeys) {}            // k is "key"
foreach (i, k; AA.byKeys) {}         // i is index/count, k is "key"

// Solve using tuple order
foreach (k, v; AA) {}    // k is "key", v is "value"
foreach (i, k, v; AA) {} // i is index/count, k is "key", v is "value"

So, to bearophile's examples re-written for scheme #2

foreach (v; AA) {}
same

foreach (k, v; AA) {}
same

foreach (k; AA.byKeys) {}
same

foreach (i, k; AA.byKeys.enumerate) {}
foreach (i, k; AA.byKeys) {} // better

foreach (i, v; AA.byValues.enumerate) {}
foreach (i, v; AA.byValues) {} // better

foreach (k, v; AA.byPairs) {}
foreach (k, v; AA) {} // better

foreach (i, k, v; AA.byPairs.enumerate) {}
foreach (i, k, v; AA) {} // better

This scheme is more complicated than #1 so it's not my preferred  
solution.  But, it does name the components better than #1.

Scheme 3) Combination.  If we were to combine these ideas we would need to  
prefer one scheme by default, if we select scheme #1, for example, then  
any time a foreach is specified we default to assuming #1 where possible,  
and #2 otherwise.

In which case there are clear cases where scheme #2 is required/used:
  - when more than 2 variables are given, or
  - a specific type is given for the final variable.

Note that #1 only has 3 possible forms (for AA or tuples):

double[string] AA;
foreach (v; AA) {}        // #1 v is voldemort(key, value)/tuple
foreach (i, v; AA) {}     // #1 i is index/count, v is voldemort(key,  
value)/tuple
foreach (int i, v; AA) {} // #1 i is index/count, v is voldemort(key,  
value)/tuple

#2 would take effect in these forms..

foreach (i, double v; AA) {} // #2 (type given)  i is index/count, v is  
value
foreach (i, string k; AA) {} // #2 (type given)  i is index/count, k is key
foreach (i, k, v; AA) {}     // #2 (3 variables) i is index/count, k is  
key, v is value

Bearophile's examples re-written for scheme #3

With..
(*) any                 // voldemort scheme #1 works with any AA/tuple  
types even worst case "all one type"
(A) int[int] AA;        // worst case
(B) double[string] AA;

foreach (v; AA) {}
(*) foreach (x; AA)       { .. use x.value .. } // better?
(A) foreach (i, k, v; AA) { }  // worse?
(B) foreach (double v; AA) { } // worse?

foreach (k, v; AA) {}
(*) foreach (x; AA)       { .. use x.key, x.value .. } // better?
(A) foreach (k, double v; AA) { }  // force scheme #2, worse?
(B) foreach (double v; AA) { }     // force scheme #2, worse?

foreach (k; AA.byKeys) {}
same // note, no voldemort reqd

foreach (i, k; AA.byKeys.enumerate) {}
(*) foreach (i, k; AA.byKeys) {}   // better, note; no voldemort reqd

foreach (i, v; AA.byValues.enumerate) {}
(*) foreach (i, v; AA.byValues) {} // netter, note; no voldemort reqd

foreach (k, v; AA.byPairs) {}
(*) foreach (x; AA) { .. use x.key, x.value .. } // better

foreach (i, k, v; AA.byPairs.enumerate) {}
(*) foreach (i, x; AA) { .. use i and x.key, x.value .. } // better
(A) foreach (i, k, v; AA) { } // better
(B) foreach (i, k, v; AA) { } // better

This is a trade off between #1 and #2 but on balance I feel it is worse  
than #1 so is not my preferred solution.

***********

Thoughts?

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/