Taking pipeline processing to the next level

Sun Sep 4 22:08:53 PDT 2016

I mostly code like this now:
  data.map!(x => transform(x)).copy(output);

It's convenient and reads nicely, but it's generally inefficient.
This sort of one-by-one software design is the core performance
problem with OOP. It seems a shame to be suffering OOP's failures even
when there is no OOP in sight.

A central premise of performance-oriented programming which I've
employed my entire career, is "where there is one, there is probably
many", and if you do something to one, you should do it to many.
With this in mind, the code I have always written doesn't tend to look
like this:
  R manipulate(Thing thing);

Instead:
  void manipulateThings(Thing *things, size_t numThings, R *output,
size_t outputLen);

Written this way for clarity. Obviously, the D equiv uses slices.

All functions are implemented with the presumption they will operate
on many things, rather than being called many times for each one.
This is the single most successful design pattern I have ever
encountered wrt high-performance code; ie, implement the array version
first.

The problem with this API design, is that it doesn't plug into
algorithms or generic code well.
  data.map!(x => transformThings(&x, 1)).copy(output);

I often wonder how we can integrate this design principle conveniently
(ie, seamlessly) into the design of algorithms, such that they can
make use of batching functions internally, and transparently?

Has anyone done any work in this area?

Ideas?