[libre-riscv-dev] [Bug 187] SimpleV vector operation semantics: reading scalar inputs before writing any outputs

Mon Feb 24 23:53:44 GMT 2020

http://bugs.libre-riscv.org/show_bug.cgi?id=187

--- Comment #5 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #3)
> If we later decide we don't want to change the SimpleV spec as suggested
> (which will help compiler backends), we can close this bug then.

ok let's discuss how in hardware this would be implemented:

void load1(int address_reg, int dest_reg, int N)
{
    int address = regs[address_reg];
    for(int i = 0; i < N; i++)
        regs[dest_reg + i] = *(int *)(address + sizeof(int) * i);
}

first, it breaks the design of SV, entirely.  SV is no longer described as "a
macro unrolling for loop at the hardware level"

it would have to be described as, "a hardware forloop except for LD which is
niw complicated because it reads the address of the first register and uses it
as a base address".

so that is an immediate red flag.

the second factor is that the base address has to either be read in the decode
phase (which requires a full stall of the entire engine waiting until the
source register has no dependencies.  clearly that is unacceptable)

the second method is to have special shadow logic similar to predication, which
is in a similar position.  the shadow is thrown across all LDs, waiting for the
address reg to be read, at which point that data is broadcast to all waiting LD
Function Units.

once received the units may drop the addresswaitingshadow (not the invalid
memory exception shadow which must also exist and separately still be held).

the problem comes when VL is greater than the number of LD/ST Function Units
(no, we are not doing 64 LD/ST FUs, the practical limit is 4 LDs, 4 STs.)

at that point, if an exception occurs during half of the way through VL=16 then
that address value absolutely must be stored on the stack, as part of context
switch state. 

failure to do so, given that the loop may not be backed out of, results in
memory corruption.

this is the point at which i say "that is a bad idea", not least because in the
current design it is completely unnecessary, it also complicates the design
*and* increases context switch latency *and* increases the number of CSRs.

so, to recap:

* SV as a conceot, the simplicity is destroyed.
* context switch latency is increased
* hardware complexity is increased
* the number of CSRs - a precious resource - is increased.

there are no upsides in other words.

-- 
You are receiving this mail because:
You are on the CC list for the bug.