Ranges, constantly frustrating

Regan Heath regan at netmail.co.nz
Mon Feb 17 09:47:21 PST 2014


This turned into a bit of a full spec so I would understand if you TL;DR  
but it would be nice to get some feedback if you have the time..

On Fri, 14 Feb 2014 17:34:46 -0000, bearophile <bearophileHUGS at lycos.com>  
wrote:
> Regan Heath:
>
>> In my case I didn't need any of these.
>
> I don't understand.

What I meant here is that I don't need the "advantages" provided by  
enumerate like the starting index.

One thing I am unclear about from your response is what you mean by  
implicit in this context?  Do you mean the process of inferring things  
(like the types in foreach)?

(taken from subsequent reply)
> Isn't this discussion about adding an index to a range?

No, it's not.  The counter I want would only be an index if the range was  
indexable, otherwise it's a count of foreach iterations (starting from  
0).  This counter is (if you like) an "index into the result set" which is  
not necessarily also an index into the source range (which may not be  
indexable).

What we currently have with foreach is an index and only for indexable  
things.  I want to instead generalise this to be a counter which is an  
index when the thing being enumerated is indexable, otherwise it is a  
count or "index into the result set".

Lets call this change scheme #0.  It solves my issue, and interestingly  
also would have meant we didn't need to add byKey or byValue to AA's,  
instead we could have simply made keys/values indexable ranges and not  
broken any existing code.

Further details of scheme #0 below.

(taken from subsequent reply)
> If you want all those schemes built in a language (and to use them  
> without adding .enumerate) you risk making
> a mess. In this case "explicit is better than implicit".

Have a read of what I have below and let me know if you think it's a  
"mess".  Scheme #2 has more rules, and might be called a "mess" perhaps.   
But, scheme #1 is fairly clean and simple and I think better overall.  The  
one downside is that without some additional syntax it cannot put tuple  
components nicely in context with descriptive variable names, so there is  
that.

To be fair to all 3 schemes below, they mostly "just work" for simple  
cases and/or cases where different types are used for key/values in AA's  
and tuples.  The more complicated rules only kick in to deal with the  
cases where there is ambiguity (AA's with the same type for key and value  
and tuples with multiple components of the same type).

Anyway, on to the details..

***********

Scheme 0) So, what I want is for foreach to simply increment a counter  
after each call to the body of the foreach, giving me a counter from 0 to  
N (or infinity/wrap).  It would do this when prompted to do so by a  
variable being supplied in the foreach statement in the usual way (for  
arrays/opApply)

This counter would not be defined/understood to be an "index" into the  
object being enumerated necessarily (as it currently is), instead if the  
object is indexable then it would indeed be an index, otherwise it's a  
count (index into the result set).

I had not been considering associative arrays until now, given current  
support (without built in tuples) they do not seem to be a special case to  
me.  Foreach over byKey() should look/function identically to foreach over  
keys, likewise for byValue().  The only difference is that in the  
byKey()/byValue() case the counter is not necessarily an index into  
anything, though it would be if the underlying byKey() range was indexable.

The syntax for this, is the same as we have for arrays/classes with  
opApply today.  In other words, "it just works" and my example would  
compile and run as one might expect.

This seems to me to be intuitive, useful and easy to implement.  Further,  
I believe it leaves the door open to having built in tuples (or using  
library extensions like enumerate()), with similarly clean syntax and no  
"mess".

***********

So, what if we had built in tuples?  Well, seems to me we could do foreach  
over AAs/tuples in one of 2 ways or even a combination of both:

Scheme 1) for AA's/tuples the value given to the foreach body is a  
voldemort (unnamed) type with a public property member for each component  
of the AA/tuple.  In the case of AA's this would then be "key" and  
"value", for tuples it might be a, b, .., z, aa, bb, .. and so on.

foreach(x; AA) {}        // Use x.key and x.value
foreach(i, x; AA) {}     // Use i, x.key and x.value
foreach(int i, x; AA) {} // Use i, x.key and x.value

Extra/better: For non-AA tuples we could allow the members to be named  
using some sort of syntax, i.e.

foreach(i, (x.bob, x.fred); AA) {} // Use i, x.bob and x.fred
or
foreach(i, x { int bob; string fred }; AA) {} // Use i, x.bob and x.fred
or
foreach(i, new x { int bob; string fred }; AA) {} // Use i, x.bob and  
x.fred


Lets look at your examples re-written for scheme #1

> foreach (v; AA) {}
foreach (x; AA) { .. use x.value .. } // better? worse?

> foreach (k, v; AA) {}
foreach (x; AA) { .. use x.key, x.value .. } // better? worse?

> foreach (k; AA.byKeys) {}
same // no voldemort reqd

> foreach (i, k; AA.byKeys.enumerate) {}
foreach (i, k; AA.byKeys) {}   // better. note, no voldemort reqd

> foreach (i, v; AA.byValues.enumerate) {}
foreach (i, v; AA.byValues) {} // better. note, no voldemort reqd

> foreach (k, v; AA.byPairs) {}
foreach (x; AA) { .. use x.key, x.value .. } // better

> foreach (i, k, v; AA.byPairs.enumerate) {}
foreach (i, x; AA) { .. use i and x.key, x.value .. } // better

This is my preferred approach TBH, you might call it foreach on "packed"  
tuples.


Scheme 2) the tuple is unpacked into separate variables given in the  
foreach.

When no types are given, components are assigned to variables such that  
the rightmost is the last AA/tuple component and subsequent untyped  
variables get previous components up and until the N+1 th which gets  
index/count.

foreach (v; AA) {}        // v is "value" (last tuple component)
foreach (k, v; AA) {}     // k is "key"   (2nd to last tuple component),  
...
foreach (i, k, v; AA) {}  // i is "index/count" because AA only has 2  
tuple components.

So, if you have N tuple components and you supply N+1 variables you get  
the index/count.  Supplying any more would be an error.

However, if a type is given and the type can be unambiguously matched to a  
single tuple component then do so.

double[string] AA;
foreach (string k; AA) {} // k is "key"

.. in which case, any additional unmatched untyped or 'int' variable is  
assigned the index/count. e.g.

foreach (i, double v; AA) {} // i is index, v is "value"
foreach (i, string k; AA) {} // i is index, k is "key"

If more than one typed variable is given, match each unambiguously.

foreach (string k, double v; AA) {} // i is index, k is "key", v is "value"

.. and likewise any unmatched untyped or 'int' variable is assigned  
index/count. e.g.

foreach (i, string k, double v; AA) {}     // i is index, k is "key", v is  
"value"
foreach (int i, string k, double v; AA) {} // i is index, k is "key", v is  
"value"

Any ambiguous situation would result in an error requiring the use of one  
of .keys/values (in the case of an AA), or to specify types (where  
possible), or to specify them all in the tuple order, e.g.

Using a worst case of..
int[int] AA;

// Error: cannot infer binding of k; could be 'key' or 'value'
foreach (int k; AA) {}

// Solve using .keys/byKey()/values/byValue()
foreach (k; AA.byKeys) {}            // k is "key"
foreach (i, k; AA.byKeys) {}         // i is index/count, k is "key"

// Solve using tuple order
foreach (k, v; AA) {}    // k is "key", v is "value"
foreach (i, k, v; AA) {} // i is index/count, k is "key", v is "value"

So, to your examples re-written for scheme #2

> foreach (v; AA) {}
same

> foreach (k, v; AA) {}
same

> foreach (k; AA.byKeys) {}
same

> foreach (i, k; AA.byKeys.enumerate) {}
foreach (i, k; AA.byKeys) {} // better

> foreach (i, v; AA.byValues.enumerate) {}
foreach (i, v; AA.byValues) {} // better

> foreach (k, v; AA.byPairs) {}
foreach (k, v; AA) {} // better

> foreach (i, k, v; AA.byPairs.enumerate) {}
foreach (i, k, v; AA) {} // better

This scheme is more complicated than #1 so it's not my preferred  
solution.  But, it does name the components better than #1.


Scheme 3) Combination.  If we were to combine these ideas we would need to  
prefer one scheme by default, if we select scheme #1, for example, then  
any time a foreach is specified we default to assuming #1 where possible,  
and #2 otherwise.

In which case there are clear cases where scheme #2 is required/used:
  - when more than 2 variables are given, or
  - a specific type is given for the final variable.

Note that #1 only has 3 possible forms (for AA or tuples):

double[string] AA;
foreach (v; AA) {}        // #1 v is voldemort(key, value)/tuple
foreach (i, v; AA) {}     // #1 i is index/count, v is voldemort(key,  
value)/tuple
foreach (int i, v; AA) {} // #1 i is index/count, v is voldemort(key,  
value)/tuple

#2 would take effect in these forms..

foreach (i, double v; AA) {} // #2 (type given)  i is index/count, v is  
value
foreach (i, string k; AA) {} // #2 (type given)  i is index/count, k is key
foreach (i, k, v; AA) {}     // #2 (3 variables) i is index/count, k is  
key, v is value

Your examples re-written for scheme #3

With..
(*) any                 // voldemort scheme #1 works with any AA/tuple  
types even worst case "all one type"
(A) int[int] AA;        // worst case
(B) double[string] AA;

> foreach (v; AA) {}
(*) foreach (x; AA)       { .. use x.value .. } // better?
(A) foreach (i, k, v; AA) { }  // worse?
(B) foreach (double v; AA) { } // worse?

> foreach (k, v; AA) {}
(*) foreach (x; AA)       { .. use x.key, x.value .. } // better?
(A) foreach (k, double v; AA) { }  // force scheme #2, worse?
(B) foreach (double v; AA) { }     // force scheme #2, worse?

> foreach (k; AA.byKeys) {}
same // note, no voldemort reqd

> foreach (i, k; AA.byKeys.enumerate) {}
(*) foreach (i, k; AA.byKeys) {}   // better, note; no voldemort reqd

> foreach (i, v; AA.byValues.enumerate) {}
(*) foreach (i, v; AA.byValues) {} // netter, note; no voldemort reqd

> foreach (k, v; AA.byPairs) {}
(*) foreach (x; AA) { .. use x.key, x.value .. } // better

> foreach (i, k, v; AA.byPairs.enumerate) {}
(*) foreach (i, x; AA) { .. use i and x.key, x.value .. } // better
(A) foreach (i, k, v; AA) { } // better
(B) foreach (i, k, v; AA) { } // better

This is a trade off between #1 and #2 but on balance I feel it is worse  
than #1 so is not my preferred solution.

***********

Thoughts?

Regan

-- 
Using Opera's revolutionary email client: http://www.opera.com/mail/


More information about the Digitalmars-d-learn mailing list