Segmented Ranges?

bearophile bearophileHUGS at lycos.com
Sat Jun 9 03:45:05 PDT 2012


Often enough you have to perform operations on some non-flat 
sequence, like on a 2D matrix:

import std.stdio;
void main() {
     auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];
     foreach (row; m1)
         foreach (x; row)
             writeln(x);
}


Such multi-level ranges are common. But a Range (below an Input 
Range) that works on a generic multi level Range is slower than 
the two nested for loops because it needs to keep attention to 
the exhaustion of the sub-ranges at every iteration (see 
popFront):


import std.stdio, std.traits, std.array, std.range, std.algorithm;

struct BilevelScan(Range) {
     private Range range;
     typeof(range.front) currentSegment;

     this(Range range_) {
         this.range = range_;
         _nextItem();
     }

     @property bool empty() {
         return range.empty && currentSegment.empty;
     }

     @property auto front() {
         return currentSegment.front;
     }

     void popFront() {
         currentSegment.popFront();
         _nextItem();
     }

     private void _nextItem() {
         while (!range.empty && currentSegment.empty) {
             currentSegment = range.front;
             range.popFront();
         }
     }
}

void main() { // some test code
     auto m1 = [[], [2, 5], [], [10, 20], [], [], [30, 40, 50]];

     foreach (x; BilevelScan!(typeof(m1))(m1))
         writeln(x);
     writeln(m1, "\n");

     auto m2 = [[10.5]];
     foreach (x; BilevelScan!(typeof(m2))(m2))
         writeln(x);
     writeln(m2, "\n");

     auto m3 = 6.iota().map!(i => iota(10 ^^ i + i, 10 ^^ i + i + 
3))();
     foreach (x; BilevelScan!(typeof(m3))(m3))
         writeln(x);
     writeln(m3, "\n");
}



In the paper "Segmented Iterators and Hierarchical Algorithms" 
Matthew H. Austern offers a solution to regain the lost 
efficiency:
http://


Some quotations from the paper:

>segmented data structures are common. Within the STL, for 
>example, the deque container is typically implemented as a 
>segmented array, and hash tables [6] are typically implemented 
>as arrays of buckets. Other examples of segmentation include 
>two-dimensional matrices, graphs, files in record-based file 
>systems, and, on modern NUMA (non-uniform memory architecture) 
>multiprocessor machines, even ordinary memory.<

>A segmented iterator has the same associated types as any other 
>iterator (a value type, for example), and, additionally, it has 
>an associated segment iterator type and local iterator type. 
>Writing a generic algorithm that uses segmented iterators 
>requires the ability to name those types. Additionally, a fully 
>generic algorithm ought to be usable with both segmented and 
>nonsegmented iterators.<

>The distinction between segmented and nonsegmented iterators is 
>orthogonal to the distinction between Forward Iterators, 
>Bidirectional Iterators, and Random Access Iterators.<

>Multiple levels of segmentation can be used with no extra 
>effort. Nothing in this discussion, or this implementation of 
>fill, assumes that the local iterator is nonsegmented.<

>Segmented iterators, an extension of ordinary STL iterators, 
>make it possible for generic algorithms to treat a segmented 
>data structure as something other than a uniform range of 
>elements. This allows existing algorithms to operate more 
>efficiently on segmented data structures, and provides a natural 
>decomposition for parallel computation. Segmented iterators 
>enable new kinds of generic algorithms that explicitly depend on 
>segmentation.<


In D the concept of Input Range is defined as a type that 
statically supports (there is also some necessary semantics that 
can't be expressed in the D code of such concepts):

R r;              // can define a range object
if (r.empty) {}   // can test for empty
r.popFront();     // can invoke popFront()
auto h = r.front; // can get the front of the range of non-void 
type



So an idea is to introduce in D the multi-level Ranges:
SegmentedInputRange
SegmentedOutputRange
SegmentedForwardRange
SegmentedBidirectionalRange
SegmentedRandomAccessRange

I think a Segmented Input Range is more complex than an Input 
Range. Inventing such ranges is like finding the miminal set of 
axioms that allow to write down a theorem, that is to efficiently 
implement the normal algorithms. This is not an easy for me, I 
don't know much about this topic.

Bye,
bearophile


More information about the Digitalmars-d mailing list