[libre-riscv-dev] uniform instruction format
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Jun 19 15:01:02 BST 2019
Still need to understand sv sublen.
https://libre-riscv.org/simple_v_extension/sv_prefix_proposal/
Jacob, I think if we remove lsk and replace it with vd/vs1 vs2 the
combinations there cover unit stride and gather/scatter, free up 1 bit and
as a result the use of bit 6 can go back to RESERVED.
That also has the fascinating side effect of making LD/ST pretty much
identical to the other formats.
Orthogonality gooood.
It also has the side effect of making it possible to map to SV Orig. The
register stride, just does not fit SV Orig.
Only one major concept is missing from SVPrefix: elwidth per reg (rs1, rs2,
rd).
Oh, that and RVC.
Only one major concept is missing from SVOrig: sv sublen, hence why I asked
about it.
Also, I feel that we are really, really missing out by not having a way in
SVPrefix to specify the amount of time (as an instruction count) that the
operations are vectorised.
If we could use e.g bit 6 to specify that the 48bit format is to be
subdivided into blocks of "register specs", then it would be possible to
have even RVC ops be vectorised, for short durations.
A countdown which on expiry the register returns to scalar. This is what
Mitch Alsup does in the 88000.
This is a tiny bit like VLIW in effect.
Which, actually, is not a bad analogy.
For simplicity I think it would be a good idea to cancel all countdowns on
any kind of branch or jump.
Or, a stricter way to do it: literally define a VLIW format, of length 16
words (to fit in a cache line or half a cache line), to be aligned on a
cache line.
The first part defines the number of registers (N) that may be spec'd as
vectors, defines their elwidth, and so on, and also specs the number of
predicates (M). Another bit could specify if the length is to be a full or
a half cache line (8 32bit or 16 32bit).
This basically being the SVOrig format, up to N 16 bit SV CSR table entries
and up to M 16 bit predicate CSR table entries. The rest being taken up
with however many instructions will fit the half/full cache line.
As you can tell I am concerned about the amount of space that both SVOrig
take up (time taken to set up and tear down CSRs, because of using CSRRW),
and also SVPrefix which cannot express the full power of SVOrig nor use RVC
in any way.
We can do better :)
Never though I would ever seriously consider implementing a VLIW machine.
Thoughts?
L.
--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev
mailing list