[libre-riscv-dev] SV / RVV, marking a register as VL.

Thu Aug 29 08:26:30 BST 2019

First of all see Bruce Hoult's remark: the whole issue may be moot and yet 
another layer of redirection seems meh.

ideas:
* I could imagine a CSRRA[I] (CSR read and add [immediate]) instructions 
complementing the "bitwise" CSR instructions. Problem is of course where to 
put that because the CSR number is big. There seems to be room in the 
CSR/func3 == 0b100 minor opcode for an immediate version, but the 
privileged spec seems to be a heavy user of the CSR/func==000 however 
(albeit all with rd = x0), which makes it a bit awkward to also have a 
CSRRA instruction  :-(.

*As above, but just have an R-type instruction that only add's to the VL 
CSR. 

*If you could mmap the CSR file,  you could use the AMO-ops to manipulate 
them, in particular use add and subtract (and max and min!). 

*Ditch the idea that a VLCSR has to specify a VL registers but simply use 
one register for VL by convention (t1= x6 or t2 = x7 ???) and use it 
implicitly,  just like sp ra are used implicitly in the C instructions, 
allowing to specify the VL register in the 64(?)  bit wide "allow to 
specify everything" version of your instructions. This, of course, 
 requires specifying you are in vector mode in other ways then VL != 1 if 
you want to use implicit vectorisation.

Ciao,
Rogier

Op dinsdag 27 augustus 2019 21:45:12 UTC+2 schreef lkcl:
>
> https://libre-riscv.org/simple_v_extension/appendix/#strncpy
>
> https://libre-riscv.org/simple_v_extension/specification/sv.setvl/#questions
>
> Something that has been bugging me about RVV and SV for some time: the 
> fact that arithmetic on VL requires additional instructions to perform a 
> transfer between VL and the registers used to carry out the necessary 
> arithmetic.
>
> If CSRs were treated orthogonally as actual scalar registers in RISCV, the 
> problem would be moot.
>
> This particularly hits home on use of fail-on-first.
>
> The above pseudocode for strcpy shows it well: a CSR load is required in 
> RVV in order to get at the modifications to VL that the failfirst actioned.
>
> In SV it is even more pronounced an effect, due to a need to increment VL 
> by one, after the fail-first, which of course requires first transferring 
> VL to a scalar reg, then performing the arithmetic, then getting the value 
> *back* into VL.
>
> I have been thinking of a solution here which I did not want to share 
> until I was reasonably sure it would be easily implementable in hardware.
>
> The solution is, instead of having a CSR that contains the current VL 
> value, instead have the CSR points *to* the scalar register that contains 
> and will indefinitely continue to contsin the current VL value.
>
> This would have the advantage that, once "linked", fail-on-first would 
> automatically result in *direct* modification of that scalar (standard 
> x1-x31 integer Regfile) register.
>
> In the pseudocode above that would save 1 instruction in the inner loop in 
> the RVV case: a reduction of around 8%.
>
> In the SV case it would save *three* instructions in what is currently a 
> 14 instruction loop: a significant saving (even when they're all RVC 
> opcodes).
>
> The hardware challenges are that these are implicit (indirect) 
> modifications to a scalar regfile. Given that VL already has to be modified 
> (in the current revision of RVV) conceptually it is not challenging, it's 
> just that instead of modifying the CSR store, the integer regfile store is 
> to be written.
>
> For an OoO design, which was my primary concern, this makes every vector 
> instruction require one additional read and write register hazard.
>
> For context: some implementations may not have chosen to make VL a read / 
> write dependency hazard, choosing instead to "stall" instruction issue 
> whilst waiting for outstanding vector operations to complete: for such 
> implementations the previous paragraph makes no sense and does not apply.
>
> If on the other hand an OoO engine *has* had support for read and write 
> hazard dependency tracking on VL added (in order to avoid stalling when VL 
> is modified), then changing that to be the scalar register (to which this 
> proposal modification to SETVL points) is not so much of a problem, and 
> might even simplify the microarchitecture.
>
> Inorder systems I am not so concerned about the hardware implications: 
> stall is the go-to "solution" and this situation is no different.
>
> As this is quite a radical design change I have been reluctant to come 
> forward with it, had to think about it for several months.
>
> Feedback appreciated, will hold off going ahead with this on SV for a 
> whole longer.
>
> Constructive feedback on its value in RVV also welcomed as it will save on 
> instruction count in tight loops in RVV, as well.
>
> L.
>
>