[libre-riscv-dev] uniform instruction format
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Mon Jun 24 14:08:16 BST 2019
On Mon, Jun 24, 2019 at 1:31 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Mon, Jun 24, 2019, 05:06 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > On Mon, Jun 24, 2019 at 11:05 AM Jacob Lifshay <programmerjake at gmail.com>
> > wrote:
> > >
> > > One way to handle svlen is to have vstart count to VL*SVLEN instead of
> > just
> > > to VL. That could simplify the CSR state somewhat.
> >
> > grouping the 2 together, in other words. so instead of destoffs (6
> > bits) and subvec-destoffs (2 bits), and having effectively 2 nested
> > loops, it's a single loop.
> >
> one way to handle it is to have a single architecturally-visible vstart,
> with twin predication, the predicate masks are known, so the architecture
> can count set bits to determine the other vstart.
mmmm... you're right about that. what's the hardware cost, though,
on restarting (vstart != 0)? a preliminary algorithm evaluation would
be:
* a truncated (masked) popcount - masked because you mask out the bits
prior to vstart.
* then, it would be necessary to do a bit-gather and bit-scatter...
clifford also names them bit-extract and bit-deposit (bext, bdep)
* followed then by a mask and finally a popcount.
it's *really* complex in other words, and all of that complexity may
be avoided quite simply by keeping that 2nd offset in the STATE CSR.
> two vstart values can be
> kept in the microarchitecture (for the common cases) and one can be
> invalidated on a csr write.
> This should work just fine if vstart being written by user software is
> extremely rare (which it should be, since it's just for exception handling
> and context switching). Assuming vstart csr writes are rare, even a
> bit-serial bit counter (2 shift registers and 2 counters) should suffice,
> allowing saving area.
how many cycles would that take? if it's more than 1 or 2, the units
will need to be specially-stalled because if not, they may run ahead
producing results that are only going to get cancelled.
more on that in the comp.arch link below
> > the issue is, when it comes to predication (and restarting in the
> > middle of a context-switch) it would be necessary to
> >
> not really, basically svlen!=1 is treated mostly like an equivalent
> svlen==1 op with VL multiplied and the predicate bits duplicated/expanded.
> so, for a vl*4 op, vstart=6 means start with the outer loop index at 1
> (6/4) and the subvector element index at 2 (6%4). for vl*3, vstart=16 means
> the outer loop index is 5 (16/3) and the subvector element index is 1
> (16/3).
yes. exactly. sorry, didn't spell it out. thought it, forgot to type it :)
the predication is... complex. i describe it here:
https://groups.google.com/d/msg/comp.arch/yIFmee-Cx-c/0nedQqnEAQAJ
oh wait, i know what happened: i tried installing a vim textarea
editor, it says "press ctrl-enter to activate" and that turns out to
be a gmail shortcut for "send".
doh :)
yes, exactly: an integer divide is needed. whilst SUBVL=1 is a
null-op, SUBVL=2 or 4 is easy, SUBVL=3 requires an actual divider to
work out the outer loop.
it's the outer loop that is the "predicate group point", see
https://libre-riscv.org/simple_v_extension/specification/#subvl-pseudocode
you see there, the VL outer loop i, the SUBVL inner loop s,
predication uses "i" (VL outer loop).
anything that saves gates, i think is better, unless it results in
madness-levels-of-CSR proliferation. that's why the VBLOCK format, as
there were just far too many CSRs in the previous revision.
l.
More information about the libre-riscv-dev
mailing list