[libre-riscv-dev] [isa-dev] SV / RVV, marking a register as VL.
lkcl
luke.leighton at gmail.com
Thu Aug 29 08:53:53 BST 2019
On Wednesday, August 28, 2019 at 11:00:12 PM UTC+1, Bruce Hoult wrote:
>
> Have a register (or CSR) contain some sort of pointer to another
> register? Just: no way. Micro-architectural nightmare.
>
that's what i thought, initially: it's why i paused for a long time before
raising the idea. then it occurred to me that
(a) there's only one of them (VL is global) so the contents may be cached
and
(b) in an efficient OoO design the CSRs *are* a register file which
requires dependency-management anyway and
(c) the implications of the CSR-register-containing-a-pointer is just
another dependency hazard
in addition, both predication and MV.X (regs[rd] = regs[regs[rs1]]) require
pretty much exactly the same microarchitectural dependency hardware to be
in place. in the case of "CSR-register-is-a-pointer", the actual vector
length is obtained via "regs[CSRregs[VL]]" which is near-identical to MV.X
[MV.X is the scalar equivalent of the vector-indexed move operation]
so a good vector engine will already *have* the required concepts /
hardware in place and/or have to solve near-identical microarchitectural
design issues anyway.
in-order sytems, the one-stop-shop solution to everything of course is
"stall, stall, stall"... :)
The scalar instructions in, for example, this strncpy loop do not take
> significant time. In a real version of the code they would be
> interleaved with vector instructions rather than all at the end,
that's *if* the vector engine is a separate one from the scalar engine.
some embedded low-cost solutions may not have a separate ALU, for example
in embedded 3D. you'll meet some of the people for whom such a
microarchitectural design decision will be critical, tomorrow.
that having been said...
> and
> would on almost all machines be completed long before the preceding
> vector instruction is. In particular the move from the VL CSR would
> happen soon after the vlbff.v and the increments to the pointers soon
> after that.
>
...ok, great: so in an in-order system, clashes (dependency hazards) would
be long gone by the time the CSR-pointing-to-the-register had been
established.
as long as the code had been arranged so that the VL CSR pointer-setup was
well in advance of its use.
> Maybe something like:
>
>
i like this example. it's really elegant.
strncpy:
mv a3, a0 # Copy dst
loop:
setvli x0, a2, vint8 # Vectors of bytes.
ok so here there's a dependency: VL has a read dependency on a2. x0 is not
written to, so there's no write dependency created.
vlbff.v v1, (a1) # Get src bytes
this instruction has both a read *and* write dependency. v1 has a read
dependency on VL, and because VL is written to it creates a second ongoing
write-dependency.
vseq.vi v0, v1, 0 # Flag zero bytes
... which occurs here. so here, VL's (new value) creates a read dependency
on both v0 and v1.
csrr t1, vl # Get number of bytes fetched
here, VL's new value from vlbff creates a read dependency on the scalar
register, t1. so there's one potential cycle's "grace" in an in-order
system where stall would not occur. as these are are not complex
operations i'd be really surprised if significant latency was required, no
matter what the microarchitecture.
the rest of the assembly code is straightforward apart from the
modification to a2 and looping back to where a2 is used...
vmfirst a4, v0 # Zero found?
add a1, a1, t1 # Bump src pointer
vmsif.v v0, v0 # Set mask up to and including zero byte.
sub a2, a2, t1 # Decrement count.
vsb.v v1, (a3), v0.t # Write out bytes
add a3, a3, t1 # Bump dst pointer
bgez a4, exit # Done
bnez a2, loop # Anymore?
... here - and it was set up (written to) over 5 instructions ago as far as
the entrance to the next loop iteration is concerned. that's still a
write-dependency, however, which in a seriously-fast out-of-order design
may result in tripping the dependency hardware.
so, let's go over it again, this time with the hypothetical
VL-points-to-a-scalar-reg augmentation.
strncpy:
mv a3, a0 # Copy dst
loop:
setvli t1, a2, vint8 # Vectors of bytes.
note that t1 is now the target. this says - hypothetically - that t1 *is*
VL.
so here there's a dependency: VL has a read dependency on a2. *t1* has a
write dependency created on whatever is going to use it in the near future
vlbff.v v1, (a1) # Get src bytes
this instruction has both read *and* write dependencies. v1 has a read
dependency not on VL, but on *t1*, and because *t1* is written to it
creates a second ongoing write-dependency... *on t1*.
vseq.vi v0, v1, 0 # Flag zero bytes
... which occurs here. so here, t1's (new value) creates a read dependency
on both v0 and v1.
# NO LONGER NEEDED csrr t1, vl # Get number of bytes fetched
t1 has *already* been set up with the required value [this is the (one)
instruction in the loop that is saved, reducing the loop count in RVV by
around... 8% or so].
again: the rest of the assembly code is straightforward apart from the
modification to a2 and looping back to where a2 is used...
vmfirst a4, v0 # Zero found?
add a1, a1, t1 # Bump src pointer
vmsif.v v0, v0 # Set mask up to and including zero byte.
sub a2, a2, t1 # Decrement count.
vsb.v v1, (a3), v0.t # Write out bytes
add a3, a3, t1 # Bump dst pointer
bgez a4, exit # Done
bnez a2, loop # Anymore?
again, t1 is all "read" here (not written to) so again, the only concern is
that a2 had been written to 5 instructions up, which, on the loop (on very
fast systems) that will create a write hazard back at the setvli, just as
with the current revision of RVV.
so, honestly i'm not seeing anything unsurmountable, here. if i haven't
missed anything, my feeling is that a good dependency-tracking system will
have the necessary hardware in place, and an in-order system is going to be
using stall, stall, stall anyway.
does that look reasonable?
l.
> exit:
> ret
>
> On Tue, Aug 27, 2019 at 12:45 PM lkcl <luke.l... at gmail.com <javascript:>>
> wrote:
> >
> > https://libre-riscv.org/simple_v_extension/appendix/#strncpy
> >
> https://libre-riscv.org/simple_v_extension/specification/sv.setvl/#questions
>
>
More information about the libre-riscv-dev
mailing list