[libre-riscv-dev] Vulkanizing

Thu Feb 20 00:22:14 GMT 2020

> > A conventional GPU uses the SIMT architecture (single instruction, multiple threads). Discrete desktop GPUs from AMD and Nvidia supports thousands of hardware threads running simultaneously, each with their own register set. An AMD Ryzen 4000 APU (CPU and GPU in the same package) supports between 320 and 512 hardware threads.
> 
> leaving the numbers aside: you're describing "single instruction,
> multiple data" but gone mad.  it's been recognised in the industry -
> thanks to the billions spent - that SIMT is unmanageable at the
> software level.  Mitch Alsup was only a consultant on the Samsung GPU
> project, and his warnings not to implement SIMT were not heeded.

"it's been recognised in the industry .... that SIMT is unmanageable"
This statement conflicts with the fact that GPU manufacturers have standardized on SIMT.
Since all modern GPUs use SIMT, programmers now write shaders optimized for SIMT.

Nvidia invented SIMT, then AMD and Intel followed.
Mali transitioned from SIMD to SIMT in 2016, when Bifrost replaced Midgard.
The GPU industry has standardized on SIMT, and GPU programmers write shaders optimized for SIMT.
GPU manufacturers use SIMT because that's what gives the best performance with modern games.

If your goal is to make Vulkan apps run as quickly as possible, then SIMT will give you that.

I agree that SIMT is not fun to program for.

> > There are way more transistors dedicated to each hardware thread, and therefore there are way fewer hardware threads available. It's a tradeoff. Each hardware thread supports a more general model of computation than the threads in a GPU, which makes it more versatile and easier to program, but you lose a lot of parallelism compared to a GPU.
> 
> except... because the CPU *is* the GPU, we have one *less* set of
> cores to worry about, one *less* entire set of L1 caches, an entire
> memory-to-memory architecture gone from the complexity, and a massive
> swath of insanely complex "userspace-kernelspace-gpuspace-and-back"
> inter-process communication wiped off the map.

The coordination costs that you describe are what kills performance in OpenGL apps.
Vulkan allows you to structure your app so that these coordination costs are no longer
the performance bottleneck. Of course, this comes at the cost of great complexity
in the app. But I don't think that unifying CPU and GPU is a big win for Vulkan apps,
although it would be a win for OpenGL.

The common theme I detect here is that you are prioritizing a sensible general
purpose programming model over Vulkan performance. The benefits will go to
apps optimized for your architecture.

Thanks for explaining the project goals in such detail. It makes a lot more sense now.