[libre-riscv-dev] uniform instruction format
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Mon Jun 24 14:08:16 BST 2019
On Mon, Jun 24, 2019 at 1:31 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Mon, Jun 24, 2019, 05:06 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > On Mon, Jun 24, 2019 at 11:05 AM Jacob Lifshay <programmerjake at gmail.com>
> > wrote:
> > >
> > > One way to handle svlen is to have vstart count to VL*SVLEN instead of
> > just
> > > to VL. That could simplify the CSR state somewhat.
> > grouping the 2 together, in other words. so instead of destoffs (6
> > bits) and subvec-destoffs (2 bits), and having effectively 2 nested
> > loops, it's a single loop.
> one way to handle it is to have a single architecturally-visible vstart,
> with twin predication, the predicate masks are known, so the architecture
> can count set bits to determine the other vstart.
mmmm... you're right about that. what's the hardware cost, though,
on restarting (vstart != 0)? a preliminary algorithm evaluation would
* a truncated (masked) popcount - masked because you mask out the bits
prior to vstart.
* then, it would be necessary to do a bit-gather and bit-scatter...
clifford also names them bit-extract and bit-deposit (bext, bdep)
* followed then by a mask and finally a popcount.
it's *really* complex in other words, and all of that complexity may
be avoided quite simply by keeping that 2nd offset in the STATE CSR.
> two vstart values can be
> kept in the microarchitecture (for the common cases) and one can be
> invalidated on a csr write.
> This should work just fine if vstart being written by user software is
> extremely rare (which it should be, since it's just for exception handling
> and context switching). Assuming vstart csr writes are rare, even a
> bit-serial bit counter (2 shift registers and 2 counters) should suffice,
> allowing saving area.
how many cycles would that take? if it's more than 1 or 2, the units
will need to be specially-stalled because if not, they may run ahead
producing results that are only going to get cancelled.
more on that in the comp.arch link below
> > the issue is, when it comes to predication (and restarting in the
> > middle of a context-switch) it would be necessary to
> not really, basically svlen!=1 is treated mostly like an equivalent
> svlen==1 op with VL multiplied and the predicate bits duplicated/expanded.
> so, for a vl*4 op, vstart=6 means start with the outer loop index at 1
> (6/4) and the subvector element index at 2 (6%4). for vl*3, vstart=16 means
> the outer loop index is 5 (16/3) and the subvector element index is 1
yes. exactly. sorry, didn't spell it out. thought it, forgot to type it :)
the predication is... complex. i describe it here:
oh wait, i know what happened: i tried installing a vim textarea
editor, it says "press ctrl-enter to activate" and that turns out to
be a gmail shortcut for "send".
yes, exactly: an integer divide is needed. whilst SUBVL=1 is a
null-op, SUBVL=2 or 4 is easy, SUBVL=3 requires an actual divider to
work out the outer loop.
it's the outer loop that is the "predicate group point", see
you see there, the VL outer loop i, the SUBVL inner loop s,
predication uses "i" (VL outer loop).
anything that saves gates, i think is better, unless it results in
madness-levels-of-CSR proliferation. that's why the VBLOCK format, as
there were just far too many CSRs in the previous revision.
More information about the libre-riscv-dev