[libre-riscv-dev] register requirements of SimpleV

Wed Oct 10 21:45:13 BST 2018

Note that the semantics implied by the Vulkan standard is SIMT, so if I
were to translate that directly to SV, the inner loop would be the entire
shader.

After some more thought, a solution that doesn't involve as much change to
SV as supporting grouped elements would be to use struct-of-arrays
representation instead of array-of-structs representation. That will lead
to inefficiencies when VL is low (reverting to scalar operations for
fixed-length vectors as they effectively have 1 group, which we might want
to special-case in the compiler).

The reasoning for the sign-extension/1-extension is to allow
register-renaming implementations to not need an additional source register
for the old contents of rd. I picked those particular extension modes as
they match the existing semantics for sub-word operations in RV
(NaN-packing (filling high bits with ones) for FP, sign-extension for
integer).

On Wed, Oct 10, 2018 at 12:16 AM lkcl <lkcl at libre-riscv.org> wrote:

> On Wed, Oct 10, 2018 at 5:56 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > On Tue, Oct 9, 2018 at 4:45 AM lkcl <lkcl at libre-riscv.org> wrote:
> >
> > > > Note that each of the 4 pixels doesn't have to be part of the
> > > > same vector, but they do have to be in registers otherwise you will
> end
> > > up
> > > > spilling for the very common case of needing derivatives for
> > > > level-of-detail calculations when using mip-mapped textures
> (probably the
> > > > most common texturing mode). It would be handy to have a group be
> able to
> > > > go up to 16 elements, but I think we can get away with 4 elements.
> > >
> > >  could you let me know if you think that will fit into either the
> > > packed/non-packedSIMD modes?
> > >
> > I think some of the groups could be represented by packed/non-packed SIMD
> > modes, however not all of them can be. In order to better explain what I
> > meant, and since e-mail doesn't really support tables, I wrote a Markdown
> > document describing my envisioned grouping system:
> >
> https://git.libre-riscv.org/?p=kazan.git;a=blob;f=docs/SimpleV+Grouping+Proposal.md;hb=HEAD
>
>  ok looks clear
>
> > One critical point is that VL is the number of element groups, not the
> > total number of elements as that makes it easier to use as VL can be set
> > once or a few times rather than needing to be changed everytime we need a
> > different vector.
>
>  yyeah this is why i seriously considered having a separate VL per CSR
> entry.  however that would mean 7 bytes times 16, or redoing the
> 16-bit x 2 entries per register (8 32-bit CSRs) to 23-bit x1 entries
> (16 23-bit regs, 9 bits unused).
>
>  more than that, however, there is a fundamental misunderstanding of
> how VL is used, here (but not so for MVL, which is a second CSR).  VL
> needs to be set as part of the actual inner loop, as it is equivalent
> to:
>
>  t0 = vl = min(MVL, t0)
>
> so it's not just a "set-and-forget", it saves one extra instruction as
> it covers the job of the "min" operation as well.  in this way, the
> last part of any loop is guaranteed to have between 1 and MVL
> elements.  details including assembly code and walk-through, here:
>
> https://www.sigarch.org/simd-instructions-considered-harmful/
>
> now, MVL on the other hand, which is the RV equivalent of the
> hard-coded (global) vector length, *that* is the one that would (also)
> need to be set, however it can be set *outside* the loop, once and
> only once.
>
> unfortunately, if you think it through, how would it be possible to
> have a loop, where the temporary register used (e.g. t0) records the
> current number of elements being executed?
>
> it would be necessary to have 16 separate CSRs, 16 separate SETVL
> instructions, *and* 16 separate SETMVL instructions, wouldn't it?  32
> additional CSRs, it's too much.
>
> > I'm open to changing the semantics for unused portions of registers as
> what
> > I proposed may not be the best solution. I chose those particular
> semantics
> > as that matches RV's behavior for scalar values (sign-extending integers
> > and 1-extending floating-point).
>
>  sign-extension etc. will also have to be part of SV when elwidths are
> different... is there a compelling reason for sign-extending the MSB
> of the highest-indexed element to fill the unused part of the
> register?  likewise FP set to all 1s?
>
> l.
>
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>