[libre-riscv-dev] [isa-dev] SV / RVV, marking a register as VL.

Wed Aug 28 22:59:57 BST 2019

Have a register (or CSR) contain some sort of pointer to another
register? Just: no way. Micro-architectural nightmare.

The scalar instructions in, for example, this strncpy loop do not take
significant time. In a real version of the code they would be
interleaved with vector instructions rather than all at the end, and
would on almost all machines be completed long before the preceding
vector instruction is. In particular the move from the VL CSR would
happen soon after the vlbff.v and the increments to the pointers soon
after that.

Maybe something like:

strncpy:
    mv a3, a0               # Copy dst
loop:
    setvli x0, a2, vint8    # Vectors of bytes.
    vlbff.v v1, (a1)        # Get src bytes
    vseq.vi v0, v1, 0       # Flag zero bytes
    csrr t1, vl             # Get number of bytes fetched
    vmfirst a4, v0          # Zero found?
    add a1, a1, t1          # Bump src pointer
    vmsif.v v0, v0          # Set mask up to and including zero byte.
    sub a2, a2, t1          # Decrement count.
    vsb.v v1, (a3), v0.t    # Write out bytes
    add a3, a3, t1          # Bump dst pointer
    bgez a4, exit           # Done
    bnez a2, loop           # Anymore?

exit:
    ret

On Tue, Aug 27, 2019 at 12:45 PM lkcl <luke.leighton at gmail.com> wrote:
>
> https://libre-riscv.org/simple_v_extension/appendix/#strncpy
> https://libre-riscv.org/simple_v_extension/specification/sv.setvl/#questions
>
> Something that has been bugging me about RVV and SV for some time: the fact that arithmetic on VL requires additional instructions to perform a transfer between VL and the registers used to carry out the necessary arithmetic.
>
> If CSRs were treated orthogonally as actual scalar registers in RISCV, the problem would be moot.
>
> This particularly hits home on use of fail-on-first.
>
> The above pseudocode for strcpy shows it well: a CSR load is required in RVV in order to get at the modifications to VL that the failfirst actioned.
>
> In SV it is even more pronounced an effect, due to a need to increment VL by one, after the fail-first, which of course requires first transferring VL to a scalar reg, then performing the arithmetic, then getting the value *back* into VL.
>
> I have been thinking of a solution here which I did not want to share until I was reasonably sure it would be easily implementable in hardware.
>
> The solution is, instead of having a CSR that contains the current VL value, instead have the CSR points *to* the scalar register that contains and will indefinitely continue to contsin the current VL value.
>
> This would have the advantage that, once "linked", fail-on-first would automatically result in *direct* modification of that scalar (standard x1-x31 integer Regfile) register.
>
> In the pseudocode above that would save 1 instruction in the inner loop in the RVV case: a reduction of around 8%.
>
> In the SV case it would save *three* instructions in what is currently a 14 instruction loop: a significant saving (even when they're all RVC opcodes).
>
> The hardware challenges are that these are implicit (indirect) modifications to a scalar regfile. Given that VL already has to be modified (in the current revision of RVV) conceptually it is not challenging, it's just that instead of modifying the CSR store, the integer regfile store is to be written.
>
> For an OoO design, which was my primary concern, this makes every vector instruction require one additional read and write register hazard.
>
> For context: some implementations may not have chosen to make VL a read / write dependency hazard, choosing instead to "stall" instruction issue whilst waiting for outstanding vector operations to complete: for such implementations the previous paragraph makes no sense and does not apply.
>
> If on the other hand an OoO engine *has* had support for read and write hazard dependency tracking on VL added (in order to avoid stalling when VL is modified), then changing that to be the scalar register (to which this proposal modification to SETVL points) is not so much of a problem, and might even simplify the microarchitecture.
>
> Inorder systems I am not so concerned about the hardware implications: stall is the go-to "solution" and this situation is no different.
>
> As this is quite a radical design change I have been reluctant to come forward with it, had to think about it for several months.
>
> Feedback appreciated, will hold off going ahead with this on SV for a whole longer.
>
> Constructive feedback on its value in RVV also welcomed as it will save on instruction count in tight loops in RVV, as well.
>
> L.
>
> --
> You received this message because you are subscribed to the Google Groups "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to isa-dev+unsubscribe at groups.riscv.org.
> To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/fd02c4a4-0e9a-41d1-b7e4-f8356ae078c4%40groups.riscv.org.