[libre-riscv-dev] Vulkanizing

Wed Feb 19 17:05:56 GMT 2020

The web site "libre-riscv.org/3d_gpu", first line of text, says:
 RISC-V 3D GPU / CPU / VPU
But I understand you are now using the POWER architecture?

This is not a conventional GPU architecture, so it will have different performance characteristics from a GPU, and this is what I would like to understand.

A conventional GPU uses the SIMT architecture (single instruction, multiple threads). Discrete desktop GPUs from AMD and Nvidia supports thousands of hardware threads running simultaneously, each with their own register set. An AMD Ryzen 4000 APU (CPU and GPU in the same package) supports between 320 and 512 hardware threads.

Your project uses a conventional CPU architecture. There are multiple cores, and multiple SMT threads within each core. There are way more transistors dedicated to each hardware thread, and therefore there are way fewer hardware threads available. It's a tradeoff. Each hardware thread supports a more general model of computation than the threads in a GPU, which makes it more versatile and easier to program, but you lose a lot of parallelism compared to a GPU.

Because you have specialized GPU instructions, this will be faster than a software GPU (llvmpipe) running on a POWER cpu, but it will still be significantly slower than a conventional GPU for conventional GPU workloads, due to having 10x or 50x fewer hardware threads.

You are providing a Vulkan driver, which is great, but then people will benchmark this processor using conventional Vulkan apps, which are designed around the strengths and limitations of conventional GPUs. It will be too slow to run AAA video games.

You are using the POWER architecture, which is designed for supercomputing and compute-heavy server applications. POWER is not known to be well suited for IOT, mobile or laptops (Apple transitioned the Macintosh from POWER to Intel due to power consumption issues). So are you primarily targeting desktop computers, and competing primarily with Intel integrated graphics and AMD APUs?

So what am I missing? What use cases is this new processor being designed for? What are your performance goals for Vulkan applications, and how will you achieve them?

Doug Moen.

On Wed, Feb 19, 2020, at 5:07 AM, Luke Kenneth Casson Leighton wrote:
> 
> 
> On Tuesday, February 18, 2020, Scheming Pony <scheming-pony at protonmail.com> wrote:
>> 
>> Thanks, that helps clarify it. I am still unsure about (1) where on the prototype and (2) subsequently on the production ASIC the Mesa driver is going to be run
> 
> on the hybrid CPU-VPU-GPU.
> 
> http://libre-riscv.org/3d_gpu.
> 
> there is no separate GPU and separate CPU.
> 
> there is only one CPUGPUVPU.
> 
> there are no separate pipelines for CPU and GPU.
> 
> there are no separate caches.
> 
> this means that COS and SIN and ATAN2 etc etc are *actual assembler instructions*, and they are *on the CPU*, as an *actual* CPU opcode.
> 
> thus, basically, the MESA driver which is in c++ which is compiled to POWER assembler will take Vulkan shader programs writtem in SPIRV and compile them JIT style *at runtime*...
> 
> ... *into POWER ASSEMBLER*.
> 
> that POWER assembly code will on Phase 2 happen to have unusual opcodes such as "completely new ATAN2 opcode" or "completely new 'YUV2RGB opcode".
> 
> for Phase 1 we do not even want that. we LITERALLY want the MESA driver to LITERALLY compile the Vulkan SPIRV to native assembler with no efforts made at any kind of optimisation.
> 
> this to be done by handing things over to LLVM JIT and telling it to get on with it.
> 
> for convenience we actually want that working first on x86, because it us easier to test.
> 
> 
>> . Could someone clarify this? Sorry, I am just starting out here.
>> 
>>  At a high level though, isn't there going to have to be some engine (e.g. Godot, scene graph) for application developers (mere mortals)? Will that overhead yield decent performance with your design (assuming Vulkan has decent performance)? Are individual graphics developers really learning Vulkan, I have heard *not*. 
> 
> if they are screaming rabid performance fanatics as in the game industry yes.
> 
> for everyone else they will go via one of the compatibility APIs which *we are not writing*.
> 
>> State of the art GPU graphics and general programming is kind of a nightmare, IMHO--incompatible drivers, hardware requirements, etc. It's a lot of overhead when trying to solve a problem
>> 
> 
> the reason for that godawful mess is down to the RPC marshalling and unmarshalling over IPC buses, all of which has to go via kernelspace.
> 
> it is ridiculous and quite insane and we are doing none of it.
> 
> when we want a cosine result we LITERALLY call the cosine frickin assembly opcode, right there, right then.
> 
> no pissing about marshalling up a cosine RPC function request which goes to kernelspace, kernel sends over IPC to GPU, GPU executesthe instruction then pisses about unmarshalling the result RPC call, does the instruction then marshalls the result *back* down the same stupid process.
> 
> 
>> 
>> As a closing thought, at the modeling (and rendering) level many of us are trying to get away from triangles.
> 
> then this processor will be a heck of a lot simpler basis to start that kind of experimentation.
> 
> and if you find you need a special instruction in hardware it will be far simpler to try it out.
> 
>>  There is a technique called f-rep (which my project Tovero and others use) which uses signed distance fields. Currently, we are using a technique to generate an isosurface of triangles in the CPU (dual contouring), then pushing the triangles to the GPU. F-rep has the potential to generate "Turing complete" shapes, if that makes sense. There is the concept of rendering them directly on the GPU (e.g. sphere tracing), but also doing engineering analysis like FEM on the GPU (or other co-processor) using AD.
> 
> very cool.
> 
> 
> 
> -- 
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68