[libre-riscv-dev] [Bug 132] SIMD-like nmigen signal for partitioning

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Thu Aug 15 06:55:24 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=132

--- Comment #23 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #21)

> actually, in that example, 3 bits perform the partitioning, one is just
> defined to be the OR of the other two.
> 
> Any single-bit nmigen Value will work.

I took a quick look late last night, it looks great.  Actually, the
partitioning mask on its own is perfect, and the vectorisation system would
perform the decode/allocation.

Remember that to do 8 bit vectors without a read modify write @ the 64 bit
regfile level we need *byte* level WEn signals.

That means EIGHT WEn signals to write a 64 bit register value.

To avoid overloading things the regfile is split HI LO 32 odd even. 4 banks.

So most FP32 vector operations will be only 4 WEn wires wide.

So the fact that the PartitionPoints can be at the byte level is perfect.

I am still uncertain whether - or how - to do 64 bit "sharing" of ALUs across 2
Reservation Stations.

The PartitionedSignal idea, if set at the 64 bit level, might actually break
the planned 4 bank 32 bit,HILO,oddeven regfile idea.

It might still work by having two RSs per PartitionedALU.  32 bit input
operands would split the PartitionPoint down the middle, and as the two halves
would be independent, there is no timing issue.

Heck it might even work to have *byte* level RSs, as originally envisaged back
in... November / December 2018.

(I scrapped that idea because the cost of the FURegs DM gate count is far too
high. 32 bit is just barely tolerable).

The Vector issue is thus responsible for keeping an eye on resource allocation.
 A pair of "unused" 16 bit partitions can, with the predicate masks, allocated
an operation.

This however needs a lot of thought. The DMs were initially supposed to help at
this fine grain level, one DM to cover 32 bit and another to cover 8 bit, then
pairs of 32 ReservationStations collaborate to cover 64 bit and pairs of 8 bit
RSs collaborate to cover 16 bit.

It still *might* be ok to have a tiny (narrow) 8 bit FURegs Dependency Matrix.
Only 8 entries wide.  Have to see.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list