Commercial video processing app in D (experience report)

Thu Apr 28 01:20:08 PDT 2016

On Thursday, 28 April 2016 at 06:22:18 UTC, Relja Ljubobratovic 
wrote:

> Can you share with us some of your experience working on image 
> and video processing modules in the app, such as are filters 
> here:
> http://www.infognition.com/VideoEnhancer/filters.html
>
> If I may ask, was that part implemented in D, C++, or was some 
> 3rd party library used?

Thanks!

The filters listed there are third-party plugins originally 
created for VirtualDub ( http://virtualdub.org/ ) by different 
people, in C++. We made just 2-3 of them, like motion-based 
temporal denoiser (Film Dirt Cleaner) and Intelligent Brightness 
filter for automatic brightness/contrast correction. Our most 
interesting and distinctive piece of tech is our Super Resolution 
engine for video upsizing and it's not in that list, it's 
built-in in the app (and also available separately as plugins for 
some other hosts). All this image processing stuff is written in 
C++ and works directly with raw image bytes, no special libraries 
involved. When video processing starts our filters usually launch 
a bunch of worker threads and these threads work in parallel each 
on its part of video frame (divided into horizontal stripes 
usually). Inside they often work block-wise and we have a bunch 
of template classes for different blocks (RGB or monochrome) 
parameterized by pixel data type and often block size, so the 
size is often is known at compile-time and compiler can unroll 
the loops properly. When doing motion search we're using our 
vector class parameterized by precision, so we have vectors of 
different precision (low-res pixel, high-res pixel, half-pixel, 
quarter-pixel etc.) and type system makes sure I don't add or mix 
vectors of different precision and don't pass a 
half-pixel-precise vector to a block reading routine that expects 
quarter-pixel precise coordinates. Where it makes sense and 
possible we use SIMD classes like F32vec4 and/or SIMD intrinsics 
for pixel operations.

Video Enhancer allows chaining several VD filters and our SR 
rescaler instances to a pipeline and it's also parallelized, so 
when first filter finishes with frame X it can immediately start 
working on frame X+1 while the next filter is still working on 
frame X. Previously it was organized as a chain of DirectShow 
filters with a special Parallelizer filter inserted between video 
processing ones, this Parallelizer had some frame queue inside 
and separated receiving and sending threads, allowing the 
connected filters to work in parallel. In version 2 it's 
trickier, since we need to be able to seek to different positions 
in the video and some filters may request a few frames before and 
after the current, so sequential pipeline doesn't suffice 
anymore, now we build a virtual chain inside one big DirectShow 
filter, and each node in that chain has its worker thread and 
they do message passing to communicate. After all, we now have a 
big DirectShow filter in 11K lines of C++ that does both Super 
Resolution resizing and invoking VirtualDub plugins (imitating 
VirtualDub for them) and doing colorspace conversions where 
necessary and organizing them all into a pipeline that is 
pull-based inside but behaves as push-based DirectShow filter 
outside.

So the D part is using COM to build and run a DirectShow graph 
with all the readers, splitters, codecs and of course our big 
video processing DirectShow filter, it talks to it via COM and 
some callbacks but doesn't do much with video frames apart from 
copying.

Btw, if you're interested in an image processing app in pure D, 
I've got one too:
http://www.infognition.com/blogsort/
(sources: https://bitbucket.org/infognition/bsort )