[libre-riscv-dev] whole stack of vulkan llvm spirv stuff

Sun Sep 15 13:57:32 BST 2019

On Fri, Sep 13, 2019 at 1:38 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> The question of whether there is something unusual about either SV or SPIRV
> which *requires* whole function vectorisation at the SPIRV levrl is
> critical because:
>
> 1. scalar IR also needs to be an option.  This for ultra small embedded
> systems that do not have vectorisation at all

I was planning on scalar being implemented as vectorizing to length=1 vectors.
>
> 2. The test environment, [ARM and/or] x86 assembler is one of the
> milestones, this so we can incrementally move to RISCV assembler
>
> 3. The next phase is *scalar* RISCV assembler (related to 1) so that,
> again, we do not have to fo too much at once. This phase to integrate
> lowRISC LLVM backend work

I was planning on just basing the code on the upstream LLVM backend
for RISC-V (which recently graduated from experimental).

I was planning on relying on LLVM's vector-to-scalar lowering pass for
handling vectorized code on just RV64GC (which we need anyway due to
SPIR-V having it's own vectors).
>
> 4. The next phase, given that Robin Kruppe is being paid to do RVV, and it
> almost *certainly* includes whole function vectorisation, and given that
> RVV and SV are conceptually similar, is to replace RVV assembly code with
> SV equivalents in a by-the-numbers fashion.

One difference that there could be is that I've been designing Kazan
around shaders that are always vectorized (needed for implementing
control barriers and screen-space derivatives, which are required by
Vulkan), whereas I would expect that Robin's code is more of
opportunistic vectorization rather than guaranteed vectorization.
>
> Right back at the beginning of this, the functions (vulkan compliance) that
> AMD put in which ultimately go through to PAL, these are all REPLACED with
> DIRECT functions written in straight c/c++/whatever, no messing about
> sending RPC calls to a separate GPU.
>
> These as a separate static/dynamic precompiked library that is linked to
> the SPIRV compiled shader at runtime.
>
> This approach I would be really genuinely surprised if we could not have
> something up and running (scalar) in about 8 weeks flat, starting from the
> amdgpu source.

Adapting amdgpu would probably be even more work than writing it from
scratch, since amdgpu's backend is tightly integrated into their ISA.
>
> Two unknowns are: how many functions need to be written which replace
> libPAL with straight c/c++/whatever
>
> and how much of llvm's other backends AMD left untouched from the cherry
> picking that they have been doing.

Note that Mesa's amd drivers usually uses upstream LLVM, which works
fine: RadeonSI (the Mesa OpenGL driver) is often faster than AMD's
proprietary driver.
>
> They have added immediates for example which were only proposed in february
> 2019.
>
> So that question is extremely important Jacob as it could cut the time to
> first prototype literally by about 12 months.

I think it would be much smaller than that, maybe 6-7 months once I
resume working on Kazan full time.

Part of why I think it's better to have the vectorization in Kazan's
codebase rather than LLVM, is that SPIR-V has some structural
invariants that the vectorizer requires that LLVM doesn't maintain on
non-GPU targets (x86 and RISC-V in particular). Rebuilding those
structural invariants could easily double the amount of work required.
(the AMDGPU backend tells the rest of LLVM to maintain those
invariants). I don't think LLVM would look kindly on doubling the
number of x86 targets just because I want a GPU variant, and Debian
definitely wouldn't want to use a non-upstream version of LLVM.

One other factor is that having the vectorization in Kazan allows
vectorized code to be generated with different compilers as backends,
such as GCC or Cranelift. Cranelift in particular I think is important
to have as a backend down the road because it compiles code easily
tens of times faster than LLVM, simply because LLVM is more targeted
as a ahead-of-time compiler and therefore runs lots of expensive
optimizations as part of code generation. Cranelift, by contrast, is
more for fast compilation when you want to quickly get a shader binary
that works and a slower compiler can be used in the background to make
a better shader binary. This is important for program start-up time
reduction (important enough that Valve(?) is currently developing
another AMDGPU shader compiler specifically because LLVM is slow).

Jacob