Whats holding ~100% D GUI back?
Gregor Mückl
gregormueckl at gmx.de
Fri Nov 29 13:27:17 UTC 2019
On Friday, 29 November 2019 at 10:08:59 UTC, Ethan wrote:
> On Friday, 29 November 2019 at 02:42:28 UTC, Gregor Mückl wrote:
>> They don't concern themselves with how the contents of these
>> quads came to be.
>
> Amazing. Every word of what you just said is wrong.
>
I doubt this, but I am open to discussion. Let's try to remain
civil and calm.
> What, you think stock Win32 widgets are rendered with CPU code
> with the Aero and later compositors?
>
Win32? Probably still are. WPF and later? No. That has always had
a DirectX rendering backend. And at least WPF has a reputation of
being sluggish. I haven't had performance issues with either so
far, though.
> You're treating custom user CPU rasterisation on pre-defined
> bounds as the entire rendering paradigm. And you can be assured
> that your code is reading to- and writing from- a quarantined
> section of memory that will be later composited by the layout
> engine.
>
> If you're going to bring up examples, study WPF and UWP.
> Entirely GPU driven WIMP APIs.
>
> But I guess we still need homework assignments.
>
OK, I'll indulge you in the interest of a civil discussion.
> 1) What is a Z buffer?
>
OK, back to basics. When rendering a 3D scene with opaque
surfaces, the resulting image only contains the surfaces nearest
to the camera. The rest is occluded. Solutions like depth sorting
the triangles and rendering back to front are possible (see e.g.
the DOOM engine and it's BSP traversal for rendering), but they
have drawbacks. E.g. even a set of three triangles may be
mutually overlapping in a way that no consistent z ordering of
the entire primitives exist. You need to split primitives to make
that work. And you still need to guarantee sorted input.
A z buffer does solves that problem by storing the minimum z
value for each pixel that was thus far encountered. When drawing
a new primitive over that pixel, that primitive's z value is
first compared to the stored value and when it's further away, it
is discarded.
Of course, a hardware z buffer can be configured in various other
interesting ways. E.g. restricting the z value range to half of
the NDC space, alternating half spaces and simultaneously
flipping between min and max tests is an old trick to skip
clearing the z buffer between frames.
There's still more to this topic: transformation of stored z
values to retain precision on 24 bit integer z buffers,
hierarchical z buffers, early z testing... I'll just cut it short
here.
> 2) What is a frustum? What does "orthographic" mean in relation
> to that?
>
The view frustum is the volume that is mapped to NDC. For
perspective projection, it's a truncated four-sided pyramid. For
orthographic projection, it's a cuboid. Fun fact: for correct
stereo rendering to a flat display, you need asymmetrical
perspective frustums; doing it with symmetric frustums rotated
towards the vergence point leads to distortions.
> 3) Comparing the traditional and Aero+ desktop compositors,
> which one has the advantage with redraws of any kind? Why?
>
I'm assuming that by traditional you mean a CPU compositor. In
that case, the GPU compositor has the full image of all top level
windows cached as textures. All it needs to do is render these to
the screen as textured quads. This is fast and, in simple terms,
it can be done without interfering with the vertical scanout of
the image to the screen to avoid tearing. Because the window
contents is cached, applications don't need to redraw their
contents when z order changes (good bye damage events!) and as a
side effect, moving and repositioning top level windows is smooth.
> 4) Why does ImGui's code get so complicated behind the scenes?
> And what advantage does this present to a programmer who wishes
> to use the API?
>
One word: batching. I'll briefly describe the Vulkan rendering
process of ImGUI, as far as I remember it from the top of my
head: it creates a single big vertex buffer for all draw
operations with a pretty uniform vertex layout, regardless of the
primitive involved. All drawing state that doesn't need pipeline
changes goes into the vertex buffer (world space coords, UV
coords, vertex color...). It also retains a memory of the
pipeline state required to draw the current set of primitives.
All high level primitives are broken down into triangles, even
lines and bezier curves. This trick reduces the number of draw
calls later. The renderer retains a list of spans in the vertex
buffer and their associated pipeline state. Whenever the higher
level drawing code does something that requires a state change,
the current span is terminated and a new one for the new pipeline
state is started. As far as I remember, the code only has two
pipelines: one for solid, untextured primitives, and one for
textured primitives that is used for text rendering.
In this model, the higher level rendering code can just emit draw
calls for individual primitives, but these are only recorded and
not executed immediately. In a second pass, the vertex buffer is
uploaded in a single transfer and the list of vertex buffer spans
is processed, switching pipelines, setting descriptors and
emitting the draw call for the relevant vertex buffer range for
each span in order.
The main reason why this works is a fundamental ordering
guarantee given by the Vulkan API: primitives listed in a vertex
buffer must be rendered in such a way that the result is as if
the primitives were processed in the order given in the buffer.
For example, when primitives overlap, the last one in the buffer
is the one that covers the overlap region in the resulting image.
> 5) Using a single untextured quad and a pixel shader, how would
> you rasterise a curve?
>
I'll let Jim Blinn answer that one for you:
https://developer.nvidia.com/gpugems/GPUGems3/gpugems3_ch25.html
I'd seriously mess up the math if I were to try to explain in
detail. Bezier curves aren't my strong suit. I'm solving
rendering equations for a living.
> (I've written UI libraries and 3D scene graphs in my career as
> a console engine programmer, so you're going to want to be
> *very* thorough if you attempt to answer all these.)
>
> On Friday, 29 November 2019 at 08:45:30 UTC, Gregor Mückl wrote:
>> GPUs are vector processors, typically 16 wide SIMD. The
>> shaders and compute kernels for then are written from a
>> single-"threaded" perspective, but this is converted to SIMD
>> qith one "thread" really being a single value in the 16 wide
>> register. This has all kinds of implications for things like
>> branching and memory accesses. Thus forum is not rhe place to
>> go into them.
>
> No, please, continue. Let's see exactly how poorly you
> understand this.
>
Where is this wrong? Have you looked at CUDA or compute shaders?
I'm honestly willing to listen and learn.
I've talked about GPUs in these terms with other experts (Intel
and nVidia R&D guys, among others) and this is a common model for
how GPUs work. So I'm frankly puzzled by your response.
> On Friday, 29 November 2019 at 09:00:20 UTC, Gregor Mückl wrote:
>> All of these things can be done on GPUs (most of it has), but
>> I highly doubt that this would be that much faster. You need
>> lots of different shaders for these primitives and switching
>> state while rendering is expensive.
>
> When did you last use a GPU API? 1999?
>
Last weekend, in fact. I'm bootstrapping a Vulkan/RTX raytracer
as pet project. I want to update an OpenGL based real time room
acoustics rendering method that I published a while ago.
> Top-end gaming engines can output near-photorealistic complex
> scenes at 60FPS. How many state changes do you think they
> perform in any given scene?
>
As few as possible. They *do* take time, although they have
become cheaper. Batching by shader is still a thing. Don't take
my word for it. See the "Pipelines" section here:
https://devblogs.nvidia.com/vulkan-dos-donts/
And that's with an API that puts pipeline state creation up front!
I don't have hard numbers for state changes and draw calls in
recent games, unfortunately. The only number that I remember was
something like about 2000 draw calls for a frame in Ashes of the
Singularity. While that game shows masses of units, I don't find
the graphics particularly impressive. There's next to no
animation on the units. The glitz is mostly decals and particle
effects. There's also not a lot of screen space post processing
going on. So I don't consider that to be representative.
> It's all dependent on API, driver, and even operating system.
> The WDDM introduced in Vista made breaking changes with XP,
> splitting a whole ton of the stuff that would traditionally be
> costly with a state change out of kernel space code and in to
> user space code. Modern APIs like DirectX 12, Vulkan, Metal etc
> go one step further and remove that responsibility from the
> driver and in to user code.
>
Ok, this is some interesting information. I haven't ever had to
care for where user/kernel mode transitions happen in the driver
stack. I guess I've been lucky that I have been able to file that
under generic driver overhead so far.
Phew, this has become a long reply and it has taken me a lot of
time to write it. I hope I could prove to you that I generally
know what I'm writing about. I could point to my history as some
additional proof, but I'd rather let this response stand for what
it is.
More information about the Digitalmars-d
mailing list