[libre-riscv-dev] SV Prefix questions

Wed Jun 26 07:39:20 BST 2019

On Tue, Jun 25, 2019 at 10:57 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>  in addition, there is absolutely no way to use a single P48 or P64
> instruction to cover LD/ST.MULTI.  how can you, if the value requested
> to be set into VL is not the *actual* amount that is stored, there?
Not the case, notice that the value is only different if VL is bigger
than VLMAX.
Note the typo fix lower down.
So, for example, on RV64:
if a0 is <= 64 (the max since VL > XLEN won't work due to predication)
then
sv.setvl a1, a0, 64
sets VL to a0 without modification (by V spec rule 1).

sv.setvl a1, x0, 64
sets VL to 64 (since x0 is treated as rs1=infinity)

in general, to set VL to an immediate value:
sv.setvl rd, x0, NEWVL

in general, to set VL using a register (when rs1 is smaller than 64):
sv.setvl rd, rs1, 64

in general, when the allocated registers are of size N:
to set VL to a variable (only does different stuff when rs1 > N):
sv.setvl rd, rs1, N
to set VL to a constant:
sv.setvl rd, x0, N
>
>  instead, LD.ST/MULTI is now forced to be a loop.
not the case

to store all registers using sv.setvl:

// assume VL is already saved somehow
sv.setvl x0, x0, 64 // set VL to 64 and ignore the result
svp.sd.vs x64, 512(a0) // store x64-x127 to *(a0 + 64 * sizeof(u64))
svp.fsd.vs f0, 0(a1) // store f0-f63 to *a1
svp.fsd.vs f64, 512(a1) // store f64-f127 to *(a1 + 64 * sizeof(f64))
sv.setvl x0, x0, 63 // set VL to 63 and ignore the result
// 63 is used since x0 may be a special encoding to mean store all
zeros instead of storing x0-x63
svp.sd.vs x1, 8(a0) // store x1-x63 to *(a0 + sizeof(u64))

>
>
> > In SVPrefix, the compiler allocates registers that hold the backing storage for
> > all vectors used, hence the compiler knows the value of VLMAX at compile time
> > for all loops.
> >
> > To avoid needing a separate instruction to set VLMAX for every loop, the unused
> > immediate field of vsetvli is used to encode VLMAX.
> > The final algorithm is as follows:
> >
> > let mut regs = [0u64; 128];
> > let mut vl = 0;
> >
> > // instruction fields:
> > let rd = get_rd_field();
> > let rs1 = get_rs1_field();
> > let vlmax = get_immed_field();
>
>  this was the encoding i experimented with 12 months ago.  it assumes
> the use of an opcode, adding the first (and only) instruction to SV,
> which is a serious precedent that cannot be taken lightly.
>
> > // handle illegal instruction decoding
> > if vlmax > XLEN {
> >     trap()
> > }
> >
> > // calculate AVL
> > let avl;
> > if rs1 == 0 {
> >     // rs1 is x0, so set avl to be infinity
> >     avl = 10000 // or some other integer much larger than vlmax
>
>  not needed.  just set it to MVL.
it just gets converted to vlmax later.
>
> > } else {
> >     avl = regs[rs1]
> > }
> >
> > // calculate VL
> > if avl <= vlmax {
> >     vl = avl
> > } else if avl < 2 * vlmax {
> >     // ceil(avl / 2), since integer div rounds down
> >     vl = (avl + 1) / 2
> > } else {
> >     vl = avl
oops, should have been vl=vlmax (apparently I'm no good with typos)
> > }
>
>  replace with far simpler "vl = MIN(avl, vlmax)".  it is *not* a good
> idea to replicate RVV's flawed design which assumes a heavy-duty
> *vector* Lanes register file.
>
> > // write rd
> > if rd != 0 {
> >     // rd is not x0
> >     regs[rd] = vl
> > }
> >
> > To avoid confusion with the V extension's instruction, the mnemonic
> > svp.setvl is chosen.
> >
> > svp.setvl is an I-type instruction.
> >
> > I think that svp.* should be the prefix for all the new instructions added by
> > SVPrefix (similar to how the C extension adds things like c.addi or c.mv).
>
>  again, to reiterate: i do *not* believe it is a good idea to add
> actual instructions to SVP.
Weren't we going to add SVP instructions anyway for the 32-bit
compressed versions of the 48-bit and 64-bit instructions?!!
We are also adding all the P48 and P64 instructions.
>
>  however, "shoe-horning" instructions which DO NOT in ANY WAY rely on
> the existence of SVP, i.e. they can operate stand-alone through
> brownfield-encoding into the P48 (bit 6) and P64 space (bit 60) as
> their *own separate specification*, as *scalar* instructions, that i
> do not have a problem with.
>
>
>
> > Now for some example code:
> >
> > DAXPY:
> >
> > C:
> > void daxpy(double *x, double *y, double a, size_t count)
> > {
> >     while(count > 0)
> >     {
> >         *y += a * *x;
> >         x++;
> >         y++;
> >         count--;
> >     }
> > }
> >
> > assembly:
> > // this is not the most optimal code, but it works
> > daxpy:
> >     // x is a0, y is a1, a is fa0, count is a2
> > .loop:
> >     svp.setvl a3, a2, 48 // VLMAX is 48, since we have space for 48 registers
>
> i'd like to be able to suggest using the P64 encoding, here, however,
> annoyingly, it's the 3-arg case, and the 3-arg case doesn't fit.
3-arg fits just fine, even 4-arg fits (fmadd).
> which is why i split it out into 2 CSRs.
I'm perfectly fine changing the encoding, I thought I'd just suggest
one that is available.
>
> what, exactly, is "wrong" with having one instruction to set MVL and
> one to set VL?  yes it's one more instruction, what's wrong with that?
>  it's not inside the loop.
>
> breaking the paradigm "there are no new opcodes" is *really* not to be
> taken lightly.
ok, that's fine for SVorig, however SVprefix is all about adding new
instructions that don't require extensive setup sequences and compiler
pain. Just because something isn't desired for SVorig doesn't mean
that we should leave it out of SVprefix.