Whats holding ~100% D GUI back?
gregormueckl at gmx.de
Fri Nov 29 16:40:01 UTC 2019
On Friday, 29 November 2019 at 15:29:20 UTC, Ola Fosheim Grøstad
> On Friday, 29 November 2019 at 13:27:17 UTC, Gregor Mückl wrote:
>>>> GPUs are vector processors, typically 16 wide SIMD. The
>>>> shaders and compute kernels for then are written from a
>> Where is this wrong? Have you looked at CUDA or compute
>> shaders? I'm honestly willing to listen and learn.
> Out of curiosity, what is being discussed? The abstract
> machine, the concrete micro code, or the concrete VLSI pipeline
> (electric pathways)?
What I wrote is a very abstract view of GPUs that is useful for
programming. I may no have done a good job of summarizing it, now
that I read that paragraph again. This is a fairly recent
presentation that gives a gentle introduction to that model:
This presentation is of course a simplification of what is going
on in a GPU, but it gets the core idea across. AMD and nVidia do
have a lot of documentation that goes into some more detail, but
at some point you're going to hit a wall. A lot of low level
details are hidden behind NDAs and that's quite frustrating.
> If the latter then I guess it all depends? But I believe a
> trick to save real estate is to have a wide ALU that is
> partioned into various word-widths with gates preventing
> "carry". I would expect there to be a mix (i.e. I would expect
> 1/x to be implemented in a less efficient, but less costly
> However, my understanding is that VLIW caused too many bubbles
> in the pipeline for compute shaders and that they moved to a
> more RISC like architecture where things like branching became
> less costly. However, these are just generic statements found
> in various online texts, so how that is made concrete in terms
> om VLSI design, well... that is less obvious. Though it seems
> reasonable that they would pick a microcode representation that
> was more granular (flexible).
I don't have good information on that. A lot of the details of
the actual ALU designs are kept under wraps. But when you want to
cram a few hundred cores that do 16 wide floating point SIMD
processing each onto a single die, simpler is better. And
throughput trumps latency for graphics.
>> Last weekend, in fact. I'm bootstrapping a Vulkan/RTX
>> raytracer as pet project. I want to update an OpenGL based
>> real time room acoustics rendering method that I published a
>> while ago.
> Cool! :-D Maybe you do some version of overlap add convolution
> in the frequency domain, or is it in the time domain? Reading
> up on Laplace transforms right now...
The convolutions for aurealization are done in the frequency
domain. Room impulse responses are quite long (up to several
seconds), so a time domain convolutions are barely feasible
offline. The only feasible way is to use the convolution theorem,
transform everything into frequency space, multiply it there, and
transform things back... while encountering the pitfalls of FFT
in a continuous signal context along the way. There's a lot of
pitfalls. I'm doing all of the convolution on the CPU because the
output buffer is read from main memory by the sound hardware.
Audio buffer updates are not in lockstep with screen refreshes,
so you can't reliably copy the next audio frame to the GPU,
convolve it there and read it back in time because the GPU is on
it's own schedule.
The OpenGL part of my method is for actually propagating sound
through the scene and computing the impulse response from that.
That is typically so expensive that it's also run asynchronously
to the audio processing and mixing. Only the final impulse
response is moved to the audio processing thread. Perceptually,
it seems that you can get away with a fairly low update rate for
the reverb in many cases.
> I remember when the IRCAM workstation was state-of-the-art, a
> funky NeXT cube with lots of DSPs. Things have come a long way
> in that realm since the 90s, at least on the hardware side.
Yes, they have! I suspect that GPUs could make damn fine DSPs
with their massive throughput. But they aren't linked well to
audio hardware in Intel PCs. And those pesky graphics programmers
want every ounce of GPU performance all to themselves and never
More information about the Digitalmars-d