[libre-riscv-dev] SV / RVV, marking a register as VL.
luke.leighton at gmail.com
Tue Aug 27 20:45:11 BST 2019
Something that has been bugging me about RVV and SV for some time: the fact that arithmetic on VL requires additional instructions to perform a transfer between VL and the registers used to carry out the necessary arithmetic.
If CSRs were treated orthogonally as actual scalar registers in RISCV, the problem would be moot.
This particularly hits home on use of fail-on-first.
The above pseudocode for strcpy shows it well: a CSR load is required in RVV in order to get at the modifications to VL that the failfirst actioned.
In SV it is even more pronounced an effect, due to a need to increment VL by one, after the fail-first, which of course requires first transferring VL to a scalar reg, then performing the arithmetic, then getting the value *back* into VL.
I have been thinking of a solution here which I did not want to share until I was reasonably sure it would be easily implementable in hardware.
The solution is, instead of having a CSR that contains the current VL value, instead have the CSR points *to* the scalar register that contains and will indefinitely continue to contsin the current VL value.
This would have the advantage that, once "linked", fail-on-first would automatically result in *direct* modification of that scalar (standard x1-x31 integer Regfile) register.
In the pseudocode above that would save 1 instruction in the inner loop in the RVV case: a reduction of around 8%.
In the SV case it would save *three* instructions in what is currently a 14 instruction loop: a significant saving (even when they're all RVC opcodes).
The hardware challenges are that these are implicit (indirect) modifications to a scalar regfile. Given that VL already has to be modified (in the current revision of RVV) conceptually it is not challenging, it's just that instead of modifying the CSR store, the integer regfile store is to be written.
For an OoO design, which was my primary concern, this makes every vector instruction require one additional read and write register hazard.
For context: some implementations may not have chosen to make VL a read / write dependency hazard, choosing instead to "stall" instruction issue whilst waiting for outstanding vector operations to complete: for such implementations the previous paragraph makes no sense and does not apply.
If on the other hand an OoO engine *has* had support for read and write hazard dependency tracking on VL added (in order to avoid stalling when VL is modified), then changing that to be the scalar register (to which this proposal modification to SETVL points) is not so much of a problem, and might even simplify the microarchitecture.
Inorder systems I am not so concerned about the hardware implications: stall is the go-to "solution" and this situation is no different.
As this is quite a radical design change I have been reluctant to come forward with it, had to think about it for several months.
Feedback appreciated, will hold off going ahead with this on SV for a whole longer.
Constructive feedback on its value in RVV also welcomed as it will save on instruction count in tight loops in RVV, as well.
More information about the libre-riscv-dev