Function Proposal: std.algorithm.iteration : cumulativeSum

e-y-e via Digitalmars-d digitalmars-d at puremagic.com
Tue Nov 1 14:52:40 PDT 2016


===

TL;DR:
The proposed function takes an input range and an optional seed 
and provides an input range containing the intermediate results 
of the summation of the given range. As of 2.072 this behaviour 
can be emulated by cumulativeSum!((a, b) => a + b)(r, s), but 
cumulativeSum involves a specialization for accurate summation of 
floating point values and redirects to cumulativeFold for 
non-floating point values. The proposed function would be useful 
for those doing basic statistical analysis on data sets, but 
would also have applications in other fields. Please let me know 
in this thread whether you would have a use for this function (or 
if you think it should/shouldn't be in phobos)

===


I'd like to propose the function cumulativeFold as a new addition 
to std.algorithm.iteration. I have already opened a pull request 
[1] for this addition so the full implementation is available 
there. The function signatures are:

auto cumulativeSum(Range)(Range r)
     if (isInputRange!Range && __traits(compiles, r.front + 
r.front))

auto cumulativeSum(Range, Seed)(Range r, Seed s)
     if (isInputRange!Range && __traits(compiles, s = s + r.front))

The function returns an input range that contains the 
intermediate results of a summation on all of the elements of a 
given range. For more details see Prefix Sum [2].


My motivation for adding this function to phobos was originally 
that I came across a need for it. I was looking into genetic 
algorithms and I wanted to implement some variation of the 
Stochastic Universal Sampling [3] selection method, which 
requires the fitness values of the population to be sorted and 
then selected based on the cumulative sum up to that point. 
Phobos was the first place I looked to for an implementation of 
cumulativeSum, as I knew that it had an implementation of sum, 
and cumulativeSum is a great use of D's ranges that I assumed 
would be in phobos. I couldn't find it, but found cumulativeFold 
instead. However, from reading the docs on sum, I knew that sum 
was a specialization of fold for accurate floating point 
summation, and given that I would be summing floating point 
values it would be better if the algorithm I used also involved 
this type of specialization.


With the knowledge that cumulativeFold is now in phobos, I 
realized that this presented quite an obvious gap in this area of 
summation and reduction. This gap is best illustrated by this 
table:

               | Provides no intermediate results | Provides 
intermediate results
  
-------------|----------------------------------|-------------------------------
  Not          |                                  |
  specialized  |               fold               |         
cumulativeFold
  for accurate |                                  |
  summation    |                                  |
  
-------------|----------------------------------|-------------------------------
  Specialized  |                                  |
  for accurate |                sum               |                
X
  summation    |                                  |


So now what would people other than me use this function for, or 
in other words, why should it be in phobos? Firstly, from a 
purely logical point of view, cumulative sums can be a useful way 
of analysing data. Once the data can be converted into a 
cumulative sum, then it is trivial to know what the current 
running total is when consuming the data. IE rather than keeping 
state like so [trivial example]:

void displayDataset(Data)(Data d, double total)
{
     double sum = 0.0;

     while (true)
     {
         if (d.empty) break;
         sum += d.front;
         if (sum > total) break;
         writeln("Data point: ", d.front, ", sum: ", sum);
         d.popFront;
     }
}

Range based code can now be written with ease to perform the same 
job:

void displayDataset(Data)(Data d, double total)
{
     d
         .zip(d.cumulativeSum)
         .until!(t => t[1] > total)
         .each!(t => writeln("Data point: ", t[0], ", sum:", 
t[1]));
}

Some other useful applications are:
  - Graphing an integral of a range of y values without 
calculating the actual integral.
  - Computing a series summation with increasing accuracy (a la 
Fourier).
  - Keeping a running mean of incoming data.


Thanks for reading if you got this far, let me know whether you 
love/hate the idea.
P.S: its late here so I may only be able to read/respond tomorrow.

===

[1] https://github.com/dlang/phobos/pull/4881
[2] https://en.wikipedia.org/wiki/Prefix_sum
[3] https://en.wikipedia.org/wiki/Stochastic_universal_sampling


More information about the Digitalmars-d mailing list