[OT] Keeping algorithm and optimisations separated

Sun Sep 9 23:25:25 PDT 2012

On 09/09/12 19:36, renoX wrote:
> Hello,
> 
> one common issue when you optimize code is that the code becomes
> difficult to read/maintain, but if you're trying to process images there
> may be hope: Halide is a DSL (currently embedded in C++) which keep the
> algorithm and the "optimization recipe"(schedule) separated AND the
> performance can be similar to hand-optimized C++ code.
> 
> You can read more about Halide here: http://halide-lang.org/
> 
> Regards,
> renoX
> 
> PS: I'm not related at all with the Halide's developers but I thought
> this is an interesting topic.

I was about to tell about Halide in D's forum : perfect timing :)

Halide has been announced just before siggraph 2012 (august). I think
they really spotted the caveats of current Image Processing frameworks.
But I'm unsure it can address all IP problems ( graph based images are
out for instance ) and how the optimization part composes with the
actual composition of functions.

As bearophile told the automatic optimization problem is still quite
open and it's not been tackled in this paper. So they decided to rely on
the expert to then specify the optimization part in a dedicated
language. They call this phase _the schedule_. The schedule language is
very terse and allows experts to try a lot of different designs without
rewriting the algorithm, quickly converging to an efficient solution for
a particular hardware.

The optimizations proposed by the framework ranges from 'compute once
this subdomain and reuse' to 'inline everything'. The former is good
when there is some redundant calculations ( as for spatial filters ),
the later is better when it's just a pixel wise pipeline. Also because
of bandwidth limitations it's often more interesting to trade
computation against locality and better performance can be achieved by
actually recomputing data.

I do recommend anyone interested in image processing or data crunching
to look at this project. For the moment they can process and let you
optimize data pipelines up to four dimensions. The only issue is that
optimal performance will be achieved only for a particular hardware (GPU
and CPU schedules are completely different), there is no automatic
optimization at runtime yet. But I'm sure the schedules produced by
expert will soon lead to heuristics enabling self optimizing algorithms
( genetic algorithms anyone ? )

There is already a Python binding and I'm sure it'll be very easy to add
a D one. The DSL basically builds an AST at runtime which is JIT
compiled to machine code.

--
Guillaume