News and problems about foreach loops

bearophile bearophileHUGS at lycos.com
Sat Nov 3 07:04:44 PDT 2012


(This post is not crystal clear because unfortunately now I don't 
have a lot of time to make it better, but I think it's 
acceptable.)

In DMD 2.061 there will be some interesting changes, one of them 
regards the foreach loops. Foreach loops are everywhere in D 
code, it's one of the most used D constructs, so this is an 
important topic; and avoiding problems and corner cases, and 
making it not bug-prone is quite important.

Hara, Walter and others have started to implement a change I have 
suggested time ago (from Python loops):


import std.stdio;
void main() {
     foreach (i; 0 .. 10)
         i++; // line 4
     auto array = [10, 20, 30, 40, 50];
     foreach (i, item; array) {
         writeln(item);
         i++; // line 8
     }
}



With warnings this now gives:

test.d(4): Warning: variable modified in foreach body requires 
ref storage class
test.d(8): Warning: variable modified in foreach body requires 
ref storage class
10
30
50



Anoter change is visible here, in past this code required:

struct Foo { int x; }
void main() {
     auto data = [Foo(10), Foo(20), Foo(30)];
     foreach (const Foo f; data) {}
}



Now you can omit the type, this is very handy:

struct Foo { int x; }
void main() {
     auto data = [Foo(10), Foo(20), Foo(30)];
     foreach (const f; data) {}
}




Unfortunately currently this compiles and runs, but there is 
already a patch to forbid it:
https://github.com/D-Programming-Language/dmd/pull/1249


void main() {
     foreach (const i; 0 .. 10)
         i++;
}


See also:
http://d.puremagic.com/issues/show_bug.cgi?id=6214
http://d.puremagic.com/issues/show_bug.cgi?id=4090
http://d.puremagic.com/issues/show_bug.cgi?id=6652



Given how foreach now works on integer ranges, there is one more 
situation worth discussing. In D this is a recognized common 
source of bugs:


struct F { int x; }
void main() {
     auto fs = [F(10), F(20), F(30)];
     foreach (f; fs)
         f.x += 1;
     assert(fs[0].x == 10);
}



I think there are 3 main use cases for foreach loops on 
collections of structs:

Case1: The struct iterated on is constant or it's not modified in 
the loop body:

import std.stdio;
struct F { int x; }
void main() {
     auto data = [F(10), F(20), F(30)];
     foreach (f; data)
         writeln(f);
     foreach (const f; data)
         writeln(f);
     foreach (const ref f; data)
         writeln(f);
     foreach (immutable f; data)
         writeln(f);
     immutable data2 = [F(10), F(20), F(30)];
     foreach (immutable ref f; data2)
         writeln(f);
}



Case2: you want to modify the structs of the array:

import std.stdio;
struct F { int x; }
void main() {
     auto data = [F(10), F(20), F(30)];
     foreach (ref f; data)
         f.x++;
     writeln(data);
}



Case3: you want to modify just the copy of the struct, but you 
don't want to modify the structs inside the array:

import std.stdio, std.range;
void main() {
     auto iotas = [iota(0,3), iota(3,7), iota(7,10)];
     foreach (it; iotas) {
         it.popFront();
         it.popFront();
         writeln(it);
     }
     writeln(iotas);
}




[Note: in foreach ranged loops you only have Case1 and Case2 (and 
in practice Case2 is not common):

import std.stdio;
void main() {
     foreach (const i; 0 .. 10)
         writeln(i);
}

]


Currently in D Case1 requires nothing, but today it's also more 
handy than before to mark the struct loop variable with const or 
immutable or even "const ref", to make sure you will not change 
it inside the loop.

Case2 requires ref. If you see "ref" (but not "const ref") you 
know inside the loop the struct will probably be modified.

Currently Case3 is denoted as Case1, but in my code this is a 
much less common case. It's easy to write by mistake the Case3 
pattern when you want to write the Case2 pattern.


So if there is a warning that warns against code like (later the 
warning will become an error, I think. It's a very small breaking 
change):

import std.stdio;
void main() {
     foreach (i; 0 .. 4)
         i++;
}



then maybe the fake case3 too should become a warning and later 
an error (this is another breaking change):

// Solution1:
void main() {
     auto data = [0, 1, 2, 3];
     foreach (x; data)
         x++; // warning
}

So the compiler always require "const" or "ref" in the loops, so 
you can't write true Case3 any more. You have to copy the struct 
manually:

import std.stdio, std.range;
void main() {
     auto iotas = [iota(0,3), iota(3,7), iota(7,10)];
     foreach (it; iotas) {
         auto it2 = it;
         it2.popFront();
         it2.popFront();
         writeln(it2);
     }
     writeln(iotas);
}


What are the alternative solutions?

A second solution is to require some kind of tagging to tell the 
D compiler that the programmer really wants the Case3:

// Solution2:
void main() {
     auto data = [0, 1, 2, 3];
     foreach (@mutable x; data)
         x++;
     foreach (/*mut*/ x; data)
         x++;
}


A third solution is use idioms, and do not change D. It means 
that on default the programmer puts always a "const" in foreach. 
This avoids most bugs caused by fake Case3, and you don't use it 
in the uncommon true Cases3.

There are few more alternative ways to face this problem. 
Including doing nothing :-)

Bye,
bearophile


More information about the Digitalmars-d mailing list