[libre-riscv-dev] [Mesa-dev] NLNet Funded development of a software/hardware MESA driver for the Libre GPGPU

Mon Jan 13 13:42:02 GMT 2020

On Thu, Jan 9, 2020 at 3:56 AM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> On 1/9/20, Jason Ekstrand <jason at jlekstrand.net> wrote:
> >> 2. as a flexible Vector Processor, soft-programmable, then over time if
> >> the industry moves to dropping vec4, so can we.
> >>
> >
> > That's very nice.  My primary reason for sending the first e-mail was that
> > SwiftShader vs. Mesa is a pretty big decision that's hard to reverse after
> > someone has poured several months into working on a driver and the argument
> > you gave in favor of Mesa was that it supports vec4.
>
> not quite :)  i garbled it (jacob spent some time explaining it, a few
> months back, so it's 3rd hand if you know what i mean).  what i can
> recall of what he said was: it's something to do with the data types,
> particularly predication, being maintained as part of SPIR-V (and
> NIR), which, if you drop that information, you have to use
> auto-vectorisation and other rather awful tricks to get it back when
> you get to the assembly level.
>
> jacob perhaps you could clarify, here?

So the major issue with the approach AMDGPU took where the SIMT to
predicated vector translation is done by the LLVM backend is that LLVM
doesn't really maintain a reducible CFG, which is needed to correctly
vectorize the code without devolving to a switch-in-a-loop. This
kinda-sorta works for AMDGPU because the backend can specifically tell
the optimization passes to try to maintain a reducible CFG. However,
that won't work for Libre-RISCV's GPU because we don't have a separate
GPU ISA (it's just RISC-V or Power, we're still deciding), so the
backends don't tell the optimization passes that they need to maintain
a reducible CFG, additionally, the AMDGPU vectorization is done as
part of the translation from LLVM IR to MIR, which makes it very hard
to adapt to a different ISA. Because of all of those issues, I decided
that it would be better to vectorize before translating to LLVM IR,
since that way, the CFG reducibility can be easily maintained. This
also gives the benefit that it's much easier to substitute a different
backend compiler such as gccjit or cranelift, since all of the
required SIMT-specific transformations are already completed before
the code goes to the backend. Both NIR and the IR I'm currently
implementing in Kazan (the non-Mesa Vulkan driver for libre-riscv)
maintain a reducible CFG throughout the optimization process. In fact,
the IR I'm implementing can't express non-reducible CFGs since it's
built as a tree of loops and code blocks where control transfer
operations can only continue a loop or exit a loop or block. Switches
work by having a nested set of blocks and the switch instruction picks
which block to break out of.

Hopefully, that all made sense. :)

Jacob Lifshay