Whats holding ~100% D GUI back?

Fri Nov 29 16:40:01 UTC 2019

On Friday, 29 November 2019 at 15:29:20 UTC, Ola Fosheim Grøstad 
wrote:
> On Friday, 29 November 2019 at 13:27:17 UTC, Gregor Mückl wrote:
>>>> GPUs are vector processors, typically 16 wide SIMD. The 
>>>> shaders and compute kernels for then are written from a
>
> […]
>
>> Where is this wrong? Have you looked at CUDA or compute 
>> shaders? I'm honestly willing to listen and learn.
>
> Out of curiosity, what is being discussed? The abstract 
> machine, the concrete micro code, or the concrete VLSI pipeline 
> (electric pathways)?
>

What I wrote is a very abstract view of GPUs that is useful for 
programming. I may no have done a good job of summarizing it, now 
that I read that paragraph again. This is a fairly recent 
presentation that gives a gentle introduction to that model:

https://aras-p.info/texts/files/2018Academy%20-%20GPU.pdf

This presentation is of course a simplification of what is going 
on in a GPU, but it gets the core idea across. AMD and nVidia do 
have a lot of documentation that goes into some more detail, but 
at some point you're going to hit a wall. A lot of low level 
details are hidden behind NDAs and that's quite frustrating.

> If the latter then I guess it all depends? But I believe a 
> trick to save real estate is to have a wide ALU that is 
> partioned into various word-widths with gates preventing 
> "carry". I would expect there to be a mix (i.e. I would expect 
> 1/x to be implemented in a less efficient, but less costly 
> manner)
>
> However, my understanding is that VLIW caused too many bubbles 
> in the pipeline for compute shaders and that they moved to a 
> more RISC like architecture where things like branching became 
> less costly. However, these are just generic statements found 
> in various online texts, so how that is made concrete in terms 
> om VLSI design, well... that is less obvious. Though it seems 
> reasonable that they would pick a microcode representation that 
> was more granular (flexible).
>

I don't have good information on that. A lot of the details of 
the actual ALU designs are kept under wraps. But when you want to 
cram a few hundred cores that do 16 wide floating point SIMD 
processing each onto a single die, simpler is better. And 
throughput trumps latency for graphics.

>> Last weekend, in fact. I'm bootstrapping a Vulkan/RTX 
>> raytracer as pet project. I want to update an OpenGL based 
>> real time room acoustics rendering method that I published a 
>> while ago.
>
> Cool!  :-D Maybe you do some version of overlap add convolution 
> in the frequency domain, or is it in the time domain?  Reading 
> up on Laplace transforms right now...
>

The convolutions for aurealization are done in the frequency 
domain. Room impulse responses are quite long (up to several 
seconds), so a time domain convolutions are barely feasible 
offline. The only feasible way is to use the convolution theorem, 
transform everything into frequency space, multiply it there, and 
transform things back... while encountering the pitfalls of FFT 
in a continuous signal context along the way. There's a lot of 
pitfalls. I'm doing all of the convolution on the CPU because the 
output buffer is read from main memory by the sound hardware. 
Audio buffer updates are not in lockstep with screen refreshes, 
so you can't reliably copy the next audio frame to the GPU, 
convolve it there and read it back in time because the GPU is on 
it's own schedule.

The OpenGL part of my method is for actually propagating sound 
through the scene and computing the impulse response from that. 
That is typically so expensive that it's also run asynchronously 
to the audio processing and mixing. Only the final impulse 
response is moved to the audio processing thread. Perceptually, 
it seems that you can get away with a fairly low update rate for 
the reverb in many cases.

> I remember when the IRCAM workstation was state-of-the-art, a 
> funky NeXT cube with lots of DSPs. Things have come a long way 
> in that realm since the 90s, at least on the hardware side.

Yes, they have! I suspect that GPUs could make damn fine DSPs 
with their massive throughput. But they aren't linked well to 
audio hardware in Intel PCs. And those pesky graphics programmers 
want every ounce of GPU performance all to themselves and never 
share! ;)