[Issue 9674] New: std.algorithm.filter problems with non-deterministic _input.front

d-bugmail at puremagic.com d-bugmail at puremagic.com
Sat Mar 9 08:05:15 PST 2013


http://d.puremagic.com/issues/show_bug.cgi?id=9674

           Summary: std.algorithm.filter problems with non-deterministic
                    _input.front
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Phobos
        AssignedTo: nobody at puremagic.com
        ReportedBy: bearophile_hugs at eml.cc


--- Comment #0 from bearophile_hugs at eml.cc 2013-03-09 08:05:14 PST ---
This Python2.6 program:


from itertools import imap, ifilter
from sys import stdout
def fun(x):
    stdout.write("*")
    return x
list(ifilter(lambda x: True, imap(fun, xrange(5))))



Prints:

*****




While this similar D program:


import std.stdio: write;
import std.array: array;
import std.range: iota;
import std.algorithm: map, filter;
void main() {
    iota(5).map!((x) {
        write("*");
        return x;
    }).filter!(_ => true).array;
}


Prints:

**********



This is a problem if the function given to map is not deterministic, or if it's
costly to evaluate.




Removing both the iota() and the array() from the D code, using a counter, but
keeping both map() and filter():


import std.algorithm: map, filter;
int count = 0;
void main() {
    auto r1 = [0, 0, 0, 0, 0];
    auto r2 = map!((int x){ count++; return 0; })(r1);
    auto r3 = filter!((int y){ return true; })(r2);
    foreach (z; r3) {}
    assert(count == 10); // passes.
}




To better show what's going on, this is a shortened version of the filter(), it
calls pred() only n times, and it calls _input.front more often:


struct FilterResult(alias pred, Range) {
    Range _input;

    this(Range r) {
        _input = r;
        while (!_input.empty && !pred(_input.front)) {
            _input.popFront();
        }
    }

    @property bool empty() { return _input.empty; }

    void popFront() {
        do {
            _input.popFront();
        } while (!_input.empty && !pred(_input.front));
    }

    @property auto ref front() {
        return _input.front;
    }
}



Calling _input.front more often means it assumes _input.front is _like_ a pure
function. But generally in D (and Python) this is not true.

I think at least it should be documented in std.algorithm ddocs that filter()
assumes the front() of its argument range to act like a pure function (despite
there isn't type sytem enforcement of this). But maybe it's better to change
std.algorithm.filter.

This modified filter() caches _input.front and it works with not deterministic
_input.front functions too:



import std.stdio, std.range, std.algorithm, std.traits, std.random;

struct FilterResult2(alias pred, Range) {
    import std.traits: ForeachType;

    Range _input;
    ForeachType!Range rFront;

    this(Range r) {
        _input = r;

        while (!_input.empty) {
            this.rFront = _input.front;
            if (pred(this.rFront))
                break;
            _input.popFront();
        }
    }

    @property bool empty() { return _input.empty; }

    void popFront() {
        do {
            _input.popFront();
            if (_input.empty)
                break;
            this.rFront = _input.front;
            if (pred(this.rFront))
                break;
        } while (true);
    }

    @property auto ref front() {
        return this.rFront;
    }
}

FilterResult2!(pred, Range) filter2(alias pred, Range)(Range rs) {
    return FilterResult2!(pred, Range)(rs);
}

// Follows demo code------

auto distinct(Range)(Range r)
if (isInputRange!Range) {
    bool[ForeachType!Range] mySet;

    return r.filter!((k) {
        if (k in mySet)
            return false;
        mySet[k] = true;
        return true;
    });
}


/**
This is a useful function, it yields lazily the distinct items
from the input iterable.
It is similar to uniq, but doesn't need the input range to be sorted.
*/
auto distinct2(Range)(Range r)
if (isInputRange!Range) {
    bool[ForeachType!Range] mySet;

    return r.filter2!((k) {
        if (k in mySet)
            return false;
        mySet[k] = true;
        return true;
    });
}

void main() {
    100.iota.map!(_ => uniform(0, 10))
    .distinct.array.sort.writeln; // Bad output.

    100.iota.map!(_ => uniform(0, 10))
    .distinct2.array.sort.writeln; // Good output.
}



What are the downsides of replacing std.algorithm.filter() with something
similar to this filter2() (with the missing methods, Unqual, etc)?


This topic was discussed at length in this thread:
http://forum.dlang.org/thread/fcbtejylrwxfploajuto@forum.dlang.org

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------


More information about the Digitalmars-d-bugs mailing list