[libre-riscv-dev] cache SRAM organisation

Thu Mar 26 12:18:53 GMT 2020

Luke Kenneth Casson Leighton schreef op wo 25-03-2020 om 15:53 [+0000]:
> On Wed, Mar 25, 2020 at 1:46 PM Staf Verhaegen <staf at fibraservi.eu> wrote:
> 
> > > this because it turns out that asynchronous SRAM can act, when used in aRegister File, as if it was a (separate) Register Bypass / ForwardingPort.  with the Out-of-Order Engine being a huge cyclic feedback loopbetween ALUs and RegFile, clock delays are an impediment, and havingcompletely separate (extra) Regfile Bypass ports dramatically increases thenumber of wires and Multiplexers.
> > 
> > Could detail more on how the adress, data and output signals of this asynchronous block would be used and switched between synchronous and asynchronous functioning. To me it seems that it would just place of the multiplexers, not the amount.
> 
> ok i will try to outine it, there is quite a lot of detail, i apologise.
> it's basically the "pass-through" system used in nmigen FIFOs, and the"synchronous" mode of nmigen Memory.  the requirements are: what iswritten has to be available *combinatorially* - i.e. on the same clockcycle - if simultaneously read via another port.
> now, yes i took note that this is not supposed to be permitted: you'renot normally permitted to be able to read *and* write to an SRAM cellat the same time.  however, that's exactly what we need.

You seem to mixing up two different concepts, e.g. synchronicity and write-through. Synchronous means signals are synced with an edge of a (clock) signal. SRAM write-through means that after a write operation you also get on the Q output the data you have just written. These two concepts are orthogonal to each other.
The current synchronous SRAM being developed will most likely have write-through behavior; will be confirmed before May test chip tape-out. It will cause delay on the signal though. I need to check if it has changed but in the OpenRAM 0.35um test tape-out I did the address and data input was latched on rising edge and the Q output was updated on falling edge of the clock. So the delay on the Q output is half a clock cycle plus the internal delay on the output latch enable signal.
So if timing of the write-through is critical it is still best to still include MUXs as said in Jacob's reply to allow the bypass ofsignal. I have seen SRAM that did include a AWT (asynchronous write through) but this just moved the MUXs inside the SRAM block and also adds them if you don't need this AWT. So I would like to keep these MUX be added added externally is needed.
In theory on a single port SRAM
> a workaround (fallback position) is, we use DFF latches.  i created a"bypass latch" function which creates DFF latches with such acombinatorial bypass: we actually use them quite a lot (includingbetween pipeline stages so that we can programmatically cut the numberof pipeline stages in half at the flick of a switch).
> however for the Register File we would not "switch" betweensynchronous / asynchronous mode.  the reason why we need thesynchronous mode is because some Function Units will be sitting idle,waiting for their input operands, which have to come from otherFunction Units as "results".

I can understand you do this to implement functional units with configurable pipeline length but I would strongly discourage to pipeline register files after each other . If the latter is excluded would you still need an asynchronous RAM block ?

greets,
Staf.