[libre-riscv-dev] SV / RVV, marking a register as VL.
luke.leighton at gmail.com
Thu Aug 29 09:23:15 BST 2019
On Thursday, August 29, 2019 at 8:26:30 AM UTC+1, Rogier Brussee wrote:
> First of all see Bruce Hoult's remark: the whole issue may be moot and yet
> another layer of redirection seems meh.
shaving one instruction off of a 12-instruction loop is not to be sneezed
at, rogier! and in SV, it's something like a reduction of 3 in 13, which
is a whopping 20% reduction! one of those is on the loop-critical-path (an
11% reduction) and the others are on the clean-up path.
if the design principles of RISC and RISC-V are to be respected and
followed, small reductions in code size are significant, and big reductions
even more so.
> * I could imagine a CSRRA[I] (CSR read and add [immediate]) instructions
> complementing the "bitwise" CSR instructions. Problem is of course where to
> put that because the CSR number is big. There seems to be room in the
> CSR/func3 == 0b100 minor opcode for an immediate version, but the
> privileged spec seems to be a heavy user of the CSR/func==000 however
> (albeit all with rd = x0), which makes it a bit awkward to also have a
> CSRRA instruction :-(.
"here be dragons"... if you have one CSR being allowed this kind of special
treatment (arithmetic) pretty soon there will be calls for yet more
arithmetic operations. at that point the ISA has a duplication of the
*entire* suite of arithmetic operators.
CSRs were never intended for this kind of close-knit arithmetic tie-in.
you set them up, you maybe clear a bit or two, do lots of operations, and
then maybe set or clear a bit or two again.
VL *completely* breaks that rule, right from the SETVLI implementation
(VL=MIN(rs1, MVL)), and fail-on-first even more so. fail-on-first not only
has a read-dependency on the VL CSR, it has a *write* dependency as well.
this is the core of the argument for special-case treatment of VL (and
making it an actual scalar register): as a CSR its use goes well beyond
that for which CSRs were originally designed.
whereas... if SETVLI is modified to set up a *pointer* to a scalar
register, *now* the VL CSR is more along the lines of how CSRs were
intended to be used. set them up once to change the behaviour (and leave
them alone), do some tightly-dependent arithmetic work, then reset them.
> *As above, but just have an R-type instruction that only add's to the VL
again, i'd be concerned at the special treatment. once you want ADD,
someone else will want MUL. and DIV. and... etc. etc.
> *If you could mmap the CSR file, you could use the AMO-ops to manipulate
> them, in particular use add and subtract (and max and min!).
iinteresting. i've mulled over the idea of mapping the CSR regfile SRAM
into the actual global memoryspace before. the architectural implications
(and power consumption due to the load on the L1 cache) had me sliiightly
mind you, for 3D, we need separate pixel buffer memory areas and so on so
it's a problem that has to be solved.
worth thinking through, some more, i feel.
*Ditch the idea that a VLCSR has to specify a VL registers but simply use
> one register for VL by convention (t1= x6 or t2 = x7 ???) and use it
> implicitly, just like sp ra are used implicitly in the C instructions,
> allowing to specify the VL register in the 64(?) bit wide "allow to
> specify everything" version of your instructions. This, of course,
> requires specifying you are in vector mode in other ways then VL != 1 if
> you want to use implicit vectorisation.
i kinda like it, however mentally i am rebelling at the lack of
orthogonality. allocating one register to VL means it's effectively
removed from use in all other circumstances...
... and if one register is allocated, you still have to have the
dependency-tracking on that (one) scalar register, and if you have
dependency-tracking on one scalar register (as a "hidden" VL) you might as
well go the whole hog and go orthogonal.
that said: from what i saw of the statistical analysis of register-usage by
gcc that WD did, many of the registers x1-x31 have near-zero percentage
utilisation, so something at the high end of the regfile numbering probably
wouldn't be missed.
however if you do that (x31 for example), use of RVC instructions is out of
the question. and if you _do_ allocate one of the registers accessible by
RVC (x8-15) you just took out a whopping 12.5% of the available registers
for use by RVC.
with all these things in mind - the VL CSR using the CSR regfile for ways
in which it was never originally designed being the most crucial - is the
idea of having VL be a pointer-to-a-scalar-reg starting to make more sense?
More information about the libre-riscv-dev