[libre-riscv-dev] [Bug 187] SimpleV vector operation semantics: reading scalar inputs before writing any outputs

Tue Feb 25 00:35:43 GMT 2020

http://bugs.libre-riscv.org/show_bug.cgi?id=187

--- Comment #6 from Jacob Lifshay <programmerjake at gmail.com> ---
First, the reason to change the semantics:

It could increase performance a lot (at least several %) by reducing
compiler-generated copies and reducing register pressure in hot loops (due to
the compiler not needing to copy the scalar inputs out of the way first).

(In reply to Luke Kenneth Casson Leighton from comment #5)
> first, it breaks the design of SV, entirely.  SV is no longer described as
> "a macro unrolling for loop at the hardware level"
> 
> it would have to be described as, "a hardware forloop except for LD which is
> niw complicated because it reads the address of the first register and uses
> it as a base address".

The reading scalar/subvector (henceforth just called scalar) inputs before
writing any outputs would be applied to all instructions uniformly, not just
LD.

You can consider this conceptually as if all scalar inputs are copied to the
context-switching state CSRs as an additional operation which executes before
the hardware for loop, it can be considered to be a new operation inserted into
the issue queue before the vector element operations. Also, the vector element
operations conceptually read all the scalar inputs from the context-switching
state CSRs written by the new inserted operation instead of directly from the
input registers.

This can be implemented in HW without any additional delay by sending the value
of the scalar input registers to the CSRs and to the element operations
simultaneously, they don't have to wait for the CSR write to complete.

When resuming from a context-switch with a partially-executed SimpleV
instruction (vstart != 0), the copy to the CSR is omitted, and the element
operations just read the scalar inputs from the CSRs.

> this is the point at which i say "that is a bad idea", not least because in
> the current design it is completely unnecessary, it also complicates the
> design *and* increases context switch latency *and* increases the number of
> CSRs.
> 
> so, to recap:
> 
> * SV as a conceot, the simplicity is destroyed.

It's much less additional conceptual complexity then the register renaming
tables you had wanted to add.

> * context switch latency is increased

Yee, but it could increase performance a lot (at least several %) by reducing
compiler-generated copies and reducing register pressure in hot loops. I'm sure
that's an acceptable tradeoff for increasing context-switch state by 512-bits
(worst-case estimate).

> * hardware complexity is increased

not by much, a mux to allow substituting the CSR values at the ALU (and other
ops) inputs and the logic to write to the CSRs.

> * the number of CSRs - a precious resource - is increased.

Power still has plenty of CSR space left, so that's not as large of a concern
as for RISC-V.

-- 
You are receiving this mail because:
You are on the CC list for the bug.