[libre-riscv-dev] SV / RVV, marking a register as VL.

Thu Aug 29 09:23:15 BST 2019

On Thursday, August 29, 2019 at 8:26:30 AM UTC+1, Rogier Brussee wrote:
>
> First of all see Bruce Hoult's remark: the whole issue may be moot and yet 
> another layer of redirection seems meh.
>

shaving one instruction off of a 12-instruction loop is not to be sneezed 
at, rogier!  and in SV, it's something like a reduction of 3 in 13, which 
is a whopping 20% reduction!  one of those is on the loop-critical-path (an 
11% reduction) and the others are on the clean-up path.

if the design principles of RISC and RISC-V are to be respected and 
followed, small reductions in code size are significant, and big reductions 
even more so.

ideas:
> * I could imagine a CSRRA[I] (CSR read and add [immediate]) instructions 
> complementing the "bitwise" CSR instructions. Problem is of course where to 
> put that because the CSR number is big. There seems to be room in the 
> CSR/func3 == 0b100 minor opcode for an immediate version, but the 
> privileged spec seems to be a heavy user of the CSR/func==000 however 
> (albeit all with rd = x0), which makes it a bit awkward to also have a 
> CSRRA instruction  :-(.
>

"here be dragons"... if you have one CSR being allowed this kind of special 
treatment (arithmetic) pretty soon there will be calls for yet more 
arithmetic operations.  at that point the ISA has a duplication of the 
*entire* suite of arithmetic operators.

CSRs were never intended for this kind of close-knit arithmetic tie-in.  
you set them up, you maybe clear a bit or two, do lots of operations, and 
then maybe set or clear a bit or two again.

VL *completely* breaks that rule, right from the SETVLI implementation 
(VL=MIN(rs1, MVL)), and fail-on-first even more so.  fail-on-first not only 
has a read-dependency on the VL CSR, it has a *write* dependency as well.

this is the core of the argument for special-case treatment of VL (and 
making it an actual scalar register): as a CSR its use goes well beyond 
that for which CSRs were originally designed.

whereas... if SETVLI is modified to set up a *pointer* to a scalar 
register, *now* the VL CSR is more along the lines of how CSRs were 
intended to be used.  set them up once to change the behaviour (and leave 
them alone), do some tightly-dependent arithmetic work, then reset them.

> *As above, but just have an R-type instruction that only add's to the VL 
> CSR. 
>

again, i'd be concerned at the special treatment.  once you want ADD, 
someone else will want MUL.  and DIV.  and... etc. etc.

> *If you could mmap the CSR file,  you could use the AMO-ops to manipulate 
> them, in particular use add and subtract (and max and min!). 
>

iinteresting.  i've mulled over the idea of mapping the CSR regfile SRAM 
into the actual global memoryspace before.   the architectural implications 
(and power consumption due to the load on the L1 cache) had me sliiightly 
concerned.

mind you, for 3D, we need separate pixel buffer memory areas and so on so 
it's a problem that has to be solved.

worth thinking through, some more, i feel.

*Ditch the idea that a VLCSR has to specify a VL registers but simply use 
> one register for VL by convention (t1= x6 or t2 = x7 ???) and use it 
> implicitly,  just like sp ra are used implicitly in the C instructions, 
> allowing to specify the VL register in the 64(?)  bit wide "allow to 
> specify everything" version of your instructions. This, of course, 
>  requires specifying you are in vector mode in other ways then VL != 1 if 
> you want to use implicit vectorisation.
>

i kinda like it, however mentally i am rebelling at the lack of 
orthogonality.  allocating one register to VL means it's effectively 
removed from use in all other circumstances...

... and if one register is allocated, you still have to have the 
dependency-tracking on that (one) scalar register, and if you have 
dependency-tracking on one scalar register (as a "hidden" VL) you might as 
well go the whole hog and go orthogonal.

that said: from what i saw of the statistical analysis of register-usage by 
gcc that WD did, many of the registers x1-x31 have near-zero percentage 
utilisation, so something at the high end of the regfile numbering probably 
wouldn't be missed.

however if you do that (x31 for example), use of RVC instructions is out of 
the question.  and if you _do_ allocate one of the registers accessible by 
RVC (x8-15) you just took out a whopping 12.5% of the available registers 
for use by RVC.

with all these things in mind - the VL CSR using the CSR regfile for ways 
in which it was never originally designed being the most crucial - is the 
idea of having VL be a pointer-to-a-scalar-reg starting to make more sense?

l.