[libre-riscv-dev] SV Prefix questions

Wed Jun 26 07:11:58 BST 2019

On Wed, Jun 26, 2019 at 6:51 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Tue, Jun 25, 2019 at 10:44 PM Luke Kenneth Casson Leighton
> <lkcl at lkcl.net> wrote:
> >
> > On Wed, Jun 26, 2019 at 6:27 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> >
> > > Starting from the V extension spec:
> > > https://github.com/riscv/riscv-v-spec/blob/e014590220e7b95b1dfa3c0665277ae1550828c9/v-spec.adoc#vsetvlivsetvl-instructions
> >
> >  "The vl setting rules are designed to be sufficiently strict to
> > preserve vl behavior across register spills and context swaps for AVL
> > ≤ VLMAX, yet flexible enough to enable implementations to improve
> > vector lane utilization for AVL > VLMAX.
> >
> > For example, this permits an implementation to set vl = ceil(AVL / 2)
> > for VLMAX < AVL < 2*VLMAX in order to evenly distribute work over the
> > last two iterations of a stripmine loop. Requirement 2 ensures that
> > the first stripmine iteration of reduction loops uses the largest
> > vector length of all iterations, even in the case of AVL < 2*VLMAX.
> > This allows software to avoid needing to explicitly calculate a
> > running maximum of vector lengths observed during a stripmined loop.
> > "
> >
> > those are *implementation* details.  they're bleeding *implementation*
> > details - hard-coding the assumption that implementors will be doing
> > SIMD "Lanes" - into the *specification*.
> supporting SIMD lanes is not the only reason: having the last 2
> iterations split approximately in half helps when doing reduction,
> since a binary tree reduction is more parallelizable than any other
> sort.

 i'd need to see a worked example.

> >
> > that is not an appropriate thing to do, and we should not be following
> > their flawed approach.
> >
> > we are *not* doing a "Lanes" (SIMD) specification.  SV sits on top of
> > the *scalar* register file, not on top of vector *lanes*.
> changing the spec in a way that benefits lanes without detracting from
> processors that use multiple scalar ops -- I would think that's a good
> thing, not something to avoid.

 we (you) managed to create a dynamically-splittable SIMD engine: if
people want to do SIMD lanes, they can work out how to split the SIMD
units in half as well.

 they can always monitor the SETVL rs1 argument, see that it's a bit
lower than usual, and have the microarchitecture adjust the SIMD
allocation transparently.

 sacrificing the ability for P48/P64 to do single-instruction
LD/ST-MULTI for *one* architectural design benefit is really seriously
detrimental, and detracts from the value of SV overall.

 context-switching without LD/ST-MULTI is a whopping batch of 30
instructions for the FP regfile and another 30 for the INT regfile.

 we get to do that with *two*.  two!  even

 and the same trick can be applied even to function calls.  a single
instruction to save/restore registers from a function!  it can even be
predicated.

 if however VL is not *absolutely* guaranteed to be set from the
*exact* value of rs1 (or the immediate), we *cannot do that*.  it
would require a loop.

 plus, remember, unlike RVV, the vectors are *NOT* fitted over the
regfile.  you cannot index (access) vector element 5 except through
another special opcode that extracts it from the *VECTOR* regfile into
the SCALAR regfile.

 in SV, there's the possibility of setting VL to *EXACTLY* the value
that *YOU* require, and you *KNOW*, with *ABSOLUTE* certainty, that
the [scalar] register further up the "vector" *WILL* have been written
to.

changing to RVV's rules BREAKS THAT PARADIGM.

so we are throwing away at least three, possibly even four *major*
benefits of SV, for what?  to follow RVV's rules that were heavily
optimised for the benefit of supercomputers?

if we had a separate vector regfile, i would say "yes", immediately,
because many of the benefits are not there.  actually i would say
"screw it let's just use RVV", but that's another story :)

l.