D for Speech and Signal Processing

Fri Nov 29 08:58:45 PST 2013

On Thursday, 28 November 2013 at 10:30:36 UTC, Chris wrote:
> There are voice analysis and speech processing toolkits like 
> Covarep and Voicebox (see links below) that were coded in 
> Matlab, because they were originally only prototypes. There has 
> been talk of porting them to C++. My first thought, as you 
> might imagine, was why not use D? However, I don't know if 
> there are any performance issues, especially for real time 
> systems (in speech recognition), talking about GC, or in fact 
> any other issues (number grinding etc.).
>
> A lot of the analysis tools are based on some sort of HMM 
> (http://en.wikipedia.org/wiki/Hidden_Markov_model) and I think 
> D could handle that elegantly.
>
> https://github.com/covarep/covarep
> http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

Hi, I have a little experience in dsp programming using oop 
languages, so I'll try to give you my mind, but my mind is more 
related to entertainment dsp softwares (asio, vst, etc...).

> talking about GC
In "pseudo" real time (RT) audio (one or many buffer are 
overlapped) you are a in a loop (interesting example is 
bufferswitch in asio). It's time critical and performance 
critical, so you'll never create a class neither allocate a 
buffer here...The idea is: what does trigger the GC: memory 
allocation and dynamic class instance creation. It's like in GUI 
programming: you don't destroy and recreate many objects in the 
"resize/realign" message handler...So the GC problem is solved: 
there is no GC problem because in RT dsp you won't do something 
stupid that'll trig a GC pass.

In speech recognition you'll mostly use some frequency-domain 
technics (not to name the fft), so basically if you don't want to 
trigger a GC pass, don't use build-in array and make your own 
array using alloc/malloc/free. For the classes it's the same, you 
can still make your own class allocator/deallocator, like 
specified in the manual (even if they say it's deprecated). With 
user managed classes and array you'll avoid most of the GC 
passes...But it doesn't mean that the most important stuff is: 
not to allocate in the audio buffer loop.