[libre-riscv-dev] Vulkanizing

Thu Feb 20 13:40:21 GMT 2020

On Thu, Feb 20, 2020 at 12:22:14AM +0000, Doug Moen wrote:
> > > A conventional GPU uses the SIMT architecture (single instruction, 
> > > multiple threads). Discrete desktop GPUs from AMD and Nvidia 
> > > supports thousands of hardware threads running simultaneously, 
> > > each with their own register set. An AMD Ryzen 4000 APU (CPU and 
> > > GPU in the same package) supports between 320 and 512 hardware 
> > > threads.
> > 
> > leaving the numbers aside: you're describing "single instruction,
> > multiple data" but gone mad.  it's been recognised in the industry -
> > thanks to the billions spent - that SIMT is unmanageable at the
> > software level.  Mitch Alsup was only a consultant on the Samsung GPU
> > project, and his warnings not to implement SIMT were not heeded.
> 
> "it's been recognised in the industry .... that SIMT is unmanageable"
> This statement conflicts with the fact that GPU manufacturers have 
> standardized on SIMT.

It doesn't necessarily conflict.  There's such a thing as the curse of 
compatibiity.

> Since all modern GPUs use SIMT, programmers now write shaders 
> optimized for SIMT.

Exactly.  The programmers have to optimise for compatibility with the 
hardware, and then the hardware has to be organised to match the 
programs.

> 
> Nvidia invented SIMT, then AMD and Intel followed.
> Mali transitioned from SIMD to SIMT in 2016, when Bifrost replaced Midgard.
> The GPU industry has standardized on SIMT, and GPU programmers write 
> shaders optimized for SIMT.
> GPU manufacturers use SIMT because that's what gives the best 
> performance with modern games.
> 
> If your goal is to make Vulkan apps run as quickly as possible, then 
> SIMT will give you that.
> 
> I agree that SIMT is not fun to program for.
> 
> > > There are way more transistors dedicated to each hardware thread, 
> > > and therefore there are way fewer hardware threads available. It's 
> > > a tradeoff. Each hardware thread supports a more general model of 
> > > computation than the threads in a GPU, which makes it more 
> > > versatile and easier to program, but you lose a lot of parallelism 
> > > compared to a GPU.
> > 
> > except... because the CPU *is* the GPU, we have one *less* set of
> > cores to worry about, one *less* entire set of L1 caches, an entire
> > memory-to-memory architecture gone from the complexity, and a massive
> > swath of insanely complex "userspace-kernelspace-gpuspace-and-back"
> > inter-process communication wiped off the map.
> 
> The coordination costs that you describe are what kills performance in 
> OpenGL apps.
> Vulkan allows you to structure your app so that these coordination 
> costs are no longer the performance bottleneck. Of course, this comes 
> at the cost of great complexity in the app. But I don't think that 
> unifying CPU and GPU is a big win for Vulkan apps,
> although it would be a win for OpenGL.
> 
> The common theme I detect here is that you are prioritizing a sensible 
> general purpose programming model over Vulkan performance. The 
> benefits will go to apps optimized for your architecture.

Didn't the original Silicon Graphics machines just use the CPU to write 
directly to graphics memory?  Did they have special instructinos for 
this?  Or just rely on having a fast enough CPU?  I know that some of 
the real-time animations on those machines were impressive for the era.

And about OpenGL.  (I don't know Vulcan).  A huge part of OpenGL 
consists of setting up buffers of data in main memory and then 
transferring them to the GPU.  As far as I know, there's no requirement 
to keep the main-memory copy around once the data have been tramsferred.
So, even if the GPU *is* the CPU, wouldn't we still have to copy all 
those buffers in case the program destroys the so-called mainmemory 
original?

Or is there some way around this without requiring a rewrite of all 
appication software?  Does OpenGL perhaps have a bit we can query to 
find out if it really copies all the buffers or use them in situ?

By the way, I'm looking forward to being able to write directly to 
graphics memory from the CPU, like I could do back in the early 80's on 
the Amiga.

-- hendrik

> 
> Thanks for explaining the project goals in such detail. It makes a lot more sense now.
> 
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev