[libre-riscv-dev] register requirements of SimpleV

lkcl lkcl at libre-riscv.org
Wed Oct 10 08:16:40 BST 2018


On Wed, Oct 10, 2018 at 5:56 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Tue, Oct 9, 2018 at 4:45 AM lkcl <lkcl at libre-riscv.org> wrote:
>
> > > Note that each of the 4 pixels doesn't have to be part of the
> > > same vector, but they do have to be in registers otherwise you will end
> > up
> > > spilling for the very common case of needing derivatives for
> > > level-of-detail calculations when using mip-mapped textures (probably the
> > > most common texturing mode). It would be handy to have a group be able to
> > > go up to 16 elements, but I think we can get away with 4 elements.
> >
> >  could you let me know if you think that will fit into either the
> > packed/non-packedSIMD modes?
> >
> I think some of the groups could be represented by packed/non-packed SIMD
> modes, however not all of them can be. In order to better explain what I
> meant, and since e-mail doesn't really support tables, I wrote a Markdown
> document describing my envisioned grouping system:
> https://git.libre-riscv.org/?p=kazan.git;a=blob;f=docs/SimpleV+Grouping+Proposal.md;hb=HEAD

 ok looks clear

> One critical point is that VL is the number of element groups, not the
> total number of elements as that makes it easier to use as VL can be set
> once or a few times rather than needing to be changed everytime we need a
> different vector.

 yyeah this is why i seriously considered having a separate VL per CSR
entry.  however that would mean 7 bytes times 16, or redoing the
16-bit x 2 entries per register (8 32-bit CSRs) to 23-bit x1 entries
(16 23-bit regs, 9 bits unused).

 more than that, however, there is a fundamental misunderstanding of
how VL is used, here (but not so for MVL, which is a second CSR).  VL
needs to be set as part of the actual inner loop, as it is equivalent
to:

 t0 = vl = min(MVL, t0)

so it's not just a "set-and-forget", it saves one extra instruction as
it covers the job of the "min" operation as well.  in this way, the
last part of any loop is guaranteed to have between 1 and MVL
elements.  details including assembly code and walk-through, here:

https://www.sigarch.org/simd-instructions-considered-harmful/

now, MVL on the other hand, which is the RV equivalent of the
hard-coded (global) vector length, *that* is the one that would (also)
need to be set, however it can be set *outside* the loop, once and
only once.

unfortunately, if you think it through, how would it be possible to
have a loop, where the temporary register used (e.g. t0) records the
current number of elements being executed?

it would be necessary to have 16 separate CSRs, 16 separate SETVL
instructions, *and* 16 separate SETMVL instructions, wouldn't it?  32
additional CSRs, it's too much.

> I'm open to changing the semantics for unused portions of registers as what
> I proposed may not be the best solution. I chose those particular semantics
> as that matches RV's behavior for scalar values (sign-extending integers
> and 1-extending floating-point).

 sign-extension etc. will also have to be part of SV when elwidths are
different... is there a compelling reason for sign-extending the MSB
of the highest-indexed element to fill the unused part of the
register?  likewise FP set to all 1s?

l.



More information about the libre-riscv-dev mailing list