Graillon 1.0, VST effect fully made with D

Sun Nov 29 09:23:18 PST 2015

On Sunday, 29 November 2015 at 16:15:32 UTC, Guillaume Piolat 
wrote:
> There is also a sample-wise FFT I've came across, which is 
> expensive but avoids chunking.

Hm, I don't know what that is :).

> Looking for similar grains is the idea behind the popular 
> auto-correlation pitch detection methods. Require two periods 
> else no autocorrelation peak though. The rumor says that the 
> non-realtime Autotune works with that, along with many modern 
> pitch detection methods.

I thought they used Laroche and Dolson's FFT based one combined 
with a peak detector, but maybe that was the real time version.

There are other full spectral resynthesis methods that throw away 
phase information and represent each spectral components as a 
bandpass filter of noise. That is rather expressive since you can 
do morphing with it. (Like you can do with images). But since you 
throw away phase information I guess some attacks suffer, so you 
have to special case the attacks as "residue" samples that are 
left in the time domain (the difference between what you can 
represent as spectral components and the left over bits).

>> I don't know what "voicedness" is? You mean things like 
>> vibrato?
>
> vibrato is the pitch variation that occur when the larynx is 
> well relaxed.

Yes, so that will generate sidebands in the frequency spectrum, 
like FM synthesis, right? So in order to pick up fast vibrato I 
would assume you would also need to do analysis of the spectrum, 
or?

> voicedness is the difference between sssssss(unvoiced) and 
> zzzzzz (voiced).
> A phonem is voiced when there is periodic glottal closure and 
> openings.

Ah! In the 90s I read a paper in Computer Music journal where 
they did song synthesis by emulating the vocal tract as a 
"physical" filter-model. I'm not sure if they used FoF for 
generating the sound. I think there was a vinyl flexi disc with 
it too. :-) I have it somewhere...

You might find it interesting.

> When the sound isn't voiced, there is no period. There isn't a 
> "pitch" there. So pitch detection tend to come with a 
> confidence measure.

So it is a problem for real time, but in non-real time you can 
work your way backwards and fill in the missing parts before 
doing resynthesis? I guess?

> The devil in that is that voicedness itself is half a lie, or 
> let say a leaky abstraction, it breaks down for distorted 
> vocals.

Right. You have a lot of these problems in sound analysis. Like 
sound separation. The brain is so impressive. I still have 
problem understanding how we can hear 3D with two ears. Like 
distinguishing above and below. I understand the basics of it, 
but it is still impressive when you try to figure out _how_.

>> I guess that's why IRCAM can sell licenses to superVP. :)
>
> Their paper on that topic are interesting, they group spectral 
> peaks by formants and move them together.

I've read the Laroche and Dowson paper in detail, and more or 
less know it by heart now, but maybe you are thinking about some 
other paper? Their paper was good on the science part, but they 
leave the artistic engineering part open to the reader... ;-) 
More insight on the artistic engineering part is most welcome!!