[libre-riscv-dev] SV Prefix questions

Sat Jun 22 13:00:19 BST 2019

https://libre-riscv.org/simple_v_extension/sv_prefix_proposal/#setvl

The proposed format is setvl rd, rs1, imm

I'm having a bit of difficulty understanding the rules, how loops would be
envisaged, why there is no MAXVL and why no use of CSRRW

The rules do not make sense, and are not as clean or clear as either RVV or
SVOrig.

Why test rs1 for being larger than the imm set VL to only half of rs1?

Why set VL to XLEN when ... no, there is just no logic to this at all, and
I suspect that loops will be complex or just plain difficult if not
impossible to understand.

Can you explain and justify it with some pseudocode examples and
explanation?

We need to choose one form or the other, it's not going to work with both.

The CSRRWI format only allows a 5 bit immediate:
>
>
CSRRWI rd, VL, #imm

and the CSRRW format only allows a 5 bit register:

CSRRW rd, VL, rs1

This would have been enough for RVV to set VL from MIN(MAXVL, rs1) and then
return that into rd, however CSRs are forced to return the *old* value, and
the loops need VL (new value) to be returned, otherwise you need a CSRRD
which is a waste of another opcode.

Therefore they decided to add VSETVL as an instruction.

I understand why they did it: I disagree with the strict rules, and so, SV
SETVL, rd is set to the *new* value.

The reason for the MAXVL as a separate CSR is because it defines the range
of the vectorisation. Where RVV has separate vector regs, the vectorisation
takes place on elements *in* that vector reg, thus MVL is an architectural
specific feature.

In SV due to the vectorisation being on the actual regfile it needs to be
actually set, otherwise when doing VL-based loops there is no clue as to
which regs to actually stop writing to.

RVV SETVL was designed very cleverly.  See that DAXPY example in SIMD
Considered Harmful.

No matter what a0 is set to, VL gets set arbitrarily and its value returned
in t0. WITHOUT NEEDING GETVL, t0 can be subtracted from a0, so that a0 will
eventually equal zero.

The last loop, t0 and VL will *equal* a0 (never be less than), such that a0
will become exactly zero.

To preserve this exact functionality, and still fit onto a standard
regfile, we *need* to be able to set MVL.

The rules that you envisage do not make sense (no explanation), and I would
be surprised if they were as simple as the RVV loop paradigm.

Note that the only reason why RVV starts setting VL to "half" is because
traditional vector systems have huge latency on element operation, due to
the lane striping and so on. This can result in a "concertina" effect
similar to that which occurs in busy traffic.  "slowing down" at the end of
loops apparently "fixes" that.

We are not doing a lanes based striped vector element system, we are using
a multi issue OoO execution engine. The behaviour is very different, and I
do not believe RVV's rules, which assume "traditional" Lanes Vector
microarchitecture, are applicable.

# a0 is n, a1 is pointer to x[0], a2 is pointer to y[0], fa0 is a
  0:  li t0, 2<<25
  4:  vsetdcfg t0             # enable 2 64b Fl.Pt. registers
loop:
  8:  setvl  t0, a0           # vl = t0 = min(mvl, n)
  c:  vld    v0, a1           # load vector x
  10:  slli   t1, t0, 3        # t1 = vl * 8 (in bytes)
  14:  vld    v1, a2           # load vector y
  18:  add    a1, a1, t1       # increment pointer to x by vl*8
  1c:  vfmadd v1, v0, fa0, v1  # v1 += v0 * fa0 (y = a * x + y)
  20:  sub    a0, a0, t0       # n -= vl (t0)
  24:  vst    v1, a2           # store Y
  28:  add    a2, a2, t1       # increment pointer to y by vl*8
  2c:  bnez   a0, loop         # repeat if n != 0
  30:  ret                     # return

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68