[libre-riscv-dev] uniform instruction format
programmerjake at gmail.com
Mon Jun 24 13:31:29 BST 2019
On Mon, Jun 24, 2019, 05:06 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> On Mon, Jun 24, 2019 at 11:05 AM Jacob Lifshay <programmerjake at gmail.com>
> > One way to handle svlen is to have vstart count to VL*SVLEN instead of
> > to VL. That could simplify the CSR state somewhat.
> grouping the 2 together, in other words. so instead of destoffs (6
> bits) and subvec-destoffs (2 bits), and having effectively 2 nested
> loops, it's a single loop.
one way to handle it is to have a single architecturally-visible vstart,
with twin predication, the predicate masks are known, so the architecture
can count set bits to determine the other vstart. two vstart values can be
kept in the microarchitecture (for the common cases) and one can be
invalidated on a csr write.
This should work just fine if vstart being written by user software is
extremely rare (which it should be, since it's just for exception handling
and context switching). Assuming vstart csr writes are rare, even a
bit-serial bit counter (2 shift registers and 2 counters) should suffice,
allowing saving area.
> > Though having a separate
> > 2-bit field in vstart may be better after all
> the issue is, when it comes to predication (and restarting in the
> middle of a context-switch) it would be necessary to
not really, basically svlen!=1 is treated mostly like an equivalent
svlen==1 op with VL multiplied and the predicate bits duplicated/expanded.
so, for a vl*4 op, vstart=6 means start with the outer loop index at 1
(6/4) and the subvector element index at 2 (6%4). for vl*3, vstart=16 means
the outer loop index is 5 (16/3) and the subvector element index is 1
> > -- I'd suggest having at
> > least 16 bits of space in vstart for the vl subfield to allow later
> > expansion.
> it means going to a 2nd CSR (for RV32), because VL (6 bits) plus MVL
> (6 bits) plus SUBVL (2 bits) plus srcoffs (6+2) plus destoffs (6+2)
> comes to err... 8+8+8+6=30 bits.
> expansion could be considered through a second CSR, in a future
> version. noted on the list.
> > One idea for expanding VL past XLEN is to use N sequential registers to
> > store N*XLEN predicate bits, similar to how a 4x32-bit subvector takes 2
> > 64-bit registers.
> yyyeah, i deliberately didn't want to do that as it means that the
> predication engine (the scoreboard) gets a lot more logic: it needs to
> optionally set 1 extra read-after-write hazard depending on whether
> *VL* is greater than XLEN!
> that's slightly scary. the alternative is, you have to reserve both
> registers and then drop one of the dependencies once the value of VL
> is known. which... now i think about it, if it's in the instruction,
> that's ok... but if it's set from a register then yes you have to wait
> before dropping the 2nd dependency.
> in addition, keeping to XLEN means that a single straight xBitManip
> (or other operation) is all that's needed. going beyond XLEN it
> starts to get complicated to do computations (shifts) on the predicate
> > Alternatively, we could use a N x u8 vector to store N predicate bits, as
> > if it was `bool pred[N];` -- the lsb of each u8 would be used as the
> > and when storing to the u8 vector each element would be 1 (true) or 0
> > (false).
> similar to RVV (except they actually treat the vectors as predicates,
> non-zero is true). my feeling is: this is wasteful. RVV has to
> create an entire suite of almost-like-xBitManip opcodes in order to
> deal with that.
> whereas we can use standard RV opcodes (ASL, OR, AND) and xBitManip .
> N x u8 would actually prohibit that.
> *sigh* there has clearly been a *lot* of private discussion, no wider
> consultation - xbitmanip latest version is alarmingly large, turning
> into a kitchen-sink of various different requirements. yes it's
> subsetted... but still, it's quite scary.
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
More information about the libre-riscv-dev