[libre-riscv-dev] SV Prefix questions

Wed Jun 26 06:56:41 BST 2019

On Wed, Jun 26, 2019 at 6:27 AM Jacob Lifshay <programmerjake at gmail.com> wrote:

> if AVL <= VLMAX {
>     // rule 1
>     vl = AVL
> } else if AVL < (2 * VLMAX) {
>     // lower bound by rule 2; allows evenly distributing work
>     // over last two iterations as mentioned in note; ceil is selected to make
>     // vl be decreasing to simplify reduction as mentioned in note.
>     vl = ceil(AVL / 2)
> } else {
>     // rule 3
>     vl = VLMAX
> }

 not necessary, at all.  we are not doing a SIMD "Lanes" architecture.
all of this can go, replaced by a single simple "rd = VL = MIN(VLMAX,
rs1)"

 really.

 in addition, there is absolutely no way to use a single P48 or P64
instruction to cover LD/ST.MULTI.  how can you, if the value requested
to be set into VL is not the *actual* amount that is stored, there?

 instead, LD.ST/MULTI is now forced to be a loop.

> In SVPrefix, the compiler allocates registers that hold the backing storage for
> all vectors used, hence the compiler knows the value of VLMAX at compile time
> for all loops.
>
> To avoid needing a separate instruction to set VLMAX for every loop, the unused
> immediate field of vsetvli is used to encode VLMAX.
> The final algorithm is as follows:
>
> let mut regs = [0u64; 128];
> let mut vl = 0;
>
> // instruction fields:
> let rd = get_rd_field();
> let rs1 = get_rs1_field();
> let vlmax = get_immed_field();

 this was the encoding i experimented with 12 months ago.  it assumes
the use of an opcode, adding the first (and only) instruction to SV,
which is a serious precedent that cannot be taken lightly.

> // handle illegal instruction decoding
> if vlmax > XLEN {
>     trap()
> }
>
> // calculate AVL
> let avl;
> if rs1 == 0 {
>     // rs1 is x0, so set avl to be infinity
>     avl = 10000 // or some other integer much larger than vlmax

 not needed.  just set it to MVL.

> } else {
>     avl = regs[rs1]
> }
>
> // calculate VL
> if avl <= vlmax {
>     vl = avl
> } else if avl < 2 * vlmax {
>     // ceil(avl / 2), since integer div rounds down
>     vl = (avl + 1) / 2
> } else {
>     vl = avl
> }

 replace with far simpler "vl = MIN(avl, vlmax)".  it is *not* a good
idea to replicate RVV's flawed design which assumes a heavy-duty
*vector* Lanes register file.

> // write rd
> if rd != 0 {
>     // rd is not x0
>     regs[rd] = vl
> }
>
> To avoid confusion with the V extension's instruction, the mnemonic
> svp.setvl is chosen.
>
> svp.setvl is an I-type instruction.
>
> I think that svp.* should be the prefix for all the new instructions added by
> SVPrefix (similar to how the C extension adds things like c.addi or c.mv).

 again, to reiterate: i do *not* believe it is a good idea to add
actual instructions to SVP.

 however, "shoe-horning" instructions which DO NOT in ANY WAY rely on
the existence of SVP, i.e. they can operate stand-alone through
brownfield-encoding into the P48 (bit 6) and P64 space (bit 60) as
their *own separate specification*, as *scalar* instructions, that i
do not have a problem with.

> Now for some example code:
>
> DAXPY:
>
> C:
> void daxpy(double *x, double *y, double a, size_t count)
> {
>     while(count > 0)
>     {
>         *y += a * *x;
>         x++;
>         y++;
>         count--;
>     }
> }
>
> assembly:
> // this is not the most optimal code, but it works
> daxpy:
>     // x is a0, y is a1, a is fa0, count is a2
> .loop:
>     svp.setvl a3, a2, 48 // VLMAX is 48, since we have space for 48 registers

i'd like to be able to suggest using the P64 encoding, here, however,
annoyingly, it's the 3-arg case, and the 3-arg case doesn't fit.
which is why i split it out into 2 CSRs.

what, exactly, is "wrong" with having one instruction to set MVL and
one to set VL?  yes it's one more instruction, what's wrong with that?
 it's not inside the loop.

breaking the paradigm "there are no new opcodes" is *really* not to be
taken lightly.

l.