[libre-riscv-dev] cache SRAM organisation

Wed Mar 25 15:53:02 GMT 2020

On Wed, Mar 25, 2020 at 1:46 PM Staf Verhaegen <staf at fibraservi.eu> wrote:

> Keep the discussion on development of other types til after the prototype tape-out in October.

good idea.

> I know there is other development ongoing for dual port RAM in the OpenRAM community so likely it is more clear what extra still has to happen.
> BTW, I think there also is possibility to have register files block with 2R1W RAM blocks.

ah that's good.  we can combine two of them to create 4R1W (we need
minimum 3R1W, however the design does not critically depend on this:
it just takes 2 cycles to read all operands for 3-operand instructions
such as FMAC).

> > this because it turns out that asynchronous SRAM can act, when used in aRegister File, as if it was a (separate) Register Bypass / ForwardingPort.  with the Out-of-Order Engine being a huge cyclic feedback loopbetween ALUs and RegFile, clock delays are an impediment, and havingcompletely separate (extra) Regfile Bypass ports dramatically increases thenumber of wires and Multiplexers.
>
> Could detail more on how the adress, data and output signals of this asynchronous block would be used and switched between synchronous and asynchronous functioning. To me it seems that it would just place of the multiplexers, not the amount.

ok i will try to outine it, there is quite a lot of detail, i apologise.

it's basically the "pass-through" system used in nmigen FIFOs, and the
"synchronous" mode of nmigen Memory.  the requirements are: what is
written has to be available *combinatorially* - i.e. on the same clock
cycle - if simultaneously read via another port.

now, yes i took note that this is not supposed to be permitted: you're
not normally permitted to be able to read *and* write to an SRAM cell
at the same time.  however, that's exactly what we need.

a workaround (fallback position) is, we use DFF latches.  i created a
"bypass latch" function which creates DFF latches with such a
combinatorial bypass: we actually use them quite a lot (including
between pipeline stages so that we can programmatically cut the number
of pipeline stages in half at the flick of a switch).

however for the Register File we would not "switch" between
synchronous / asynchronous mode.  the reason why we need the
synchronous mode is because some Function Units will be sitting idle,
waiting for their input operands, which have to come from other
Function Units as "results".

these "results" of course have to go into the Register File, however
in a large number of cases, there are FUs waiting for a "read" of that
exact register - the one that's just been written - and to make that
FU wait *another* clock cycle is just wasting time.

btw, also: the synchronous "Read" mode also allows us to transfer data
from the Regfile into Function Unit input latches on that same cycle,
rather than (again) waiting yet another cycle, unnecessarily.  in some
cases, where a pipeline has an immediate free slot, we can actually
proceed directly to executing stage 1 of the pipeline, *in* the same
cycle as the actual Regfile Read.

the other aspect of the Dependency Matrices is: each register has
already been given a "single line" (one "enable" bit per Reg #).  the
binary register number is *gone*.

if we use a "standard" binary-addressable Register File, we actually
have to *recreate* that binary register number / address, using
unary-to-binary converters!

so my point is: if we have all the register numbers *already* encoded
in unary (one "enable" bit per reg #), we actually don't *want* a
binary-addressing mux-map on the Regfile, at all!

we actually just want a Priority-Picker, directly onto the SRAM port
write/read enable lines, where each SRAM row has *no* binary-address
decoder on it.

i hope that was enough to explain what is going on?  if not please do
ask to clarify, when you have time.

l.