[libre-riscv-dev] uniform instruction format

Mon Jun 24 13:05:48 BST 2019

On Mon, Jun 24, 2019 at 11:05 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> One way to handle svlen is to have vstart count to VL*SVLEN instead of just
> to VL. That could simplify the CSR state somewhat.

 grouping the 2 together, in other words.  so instead of destoffs (6
bits) and subvec-destoffs (2 bits), and having effectively 2 nested
loops, it's a single loop.

> Though having a separate
> 2-bit field in vstart may be better after all

 the issue is, when it comes to predication (and restarting in the
middle of a context-switch) it would be necessary to

> -- I'd suggest having at
> least 16 bits of space in vstart for the vl subfield to allow later
> expansion.

 it means going to a 2nd CSR (for RV32), because VL (6 bits) plus MVL
(6 bits) plus SUBVL (2 bits) plus srcoffs (6+2) plus destoffs (6+2)
comes to err... 8+8+8+6=30 bits.

 expansion could be considered through a second CSR, in a future
version.  noted on the list.

> One idea for expanding VL past XLEN is to use N sequential registers to
> store N*XLEN predicate bits, similar to how a 4x32-bit subvector takes 2
> 64-bit registers.

 yyyeah, i deliberately didn't want to do that as it means that the
predication engine (the scoreboard) gets a lot more logic: it needs to
optionally set 1 extra read-after-write hazard depending on whether
*VL* is greater than XLEN!

 that's slightly scary.  the alternative is, you have to reserve both
registers and then drop one of the dependencies once the value of VL
is known. which... now i think about it, if it's in the instruction,
that's ok... but if it's set from a register then yes you have to wait
before dropping the 2nd dependency.

 in addition, keeping to XLEN means that a single straight xBitManip
(or other operation) is all that's needed.  going beyond XLEN it
starts to get complicated to do computations (shifts) on the predicate
bits.

> Alternatively, we could use a N x u8 vector to store N predicate bits, as
> if it was `bool pred[N];` -- the lsb of each u8 would be used as the value
> and when storing to the u8 vector each element would be 1 (true) or 0
> (false).

 similar to RVV (except they actually treat the vectors as predicates,
non-zero is true).  my feeling is: this is wasteful.  RVV has to
create an entire suite of almost-like-xBitManip opcodes in order to
deal with that.

 whereas we can use standard RV opcodes (ASL, OR, AND) and xBitManip .
N x u8 would actually prohibit that.

*sigh* there has clearly been a *lot* of private discussion, no wider
consultation - xbitmanip latest version is alarmingly large, turning
into a kitchen-sink of various different requirements.  yes it's
subsetted... but still, it's quite scary.

l.