[libre-riscv-dev] [Bug 296] idea: cyclic buffer between FUs and register file

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri May 1 19:19:03 BST 2020


https://bugs.libre-soc.org/show_bug.cgi?id=296

--- Comment #4 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
i believe it may be possible to use existing components for this: the
mask-cancellable Pipeline class (where the "mask" is the
GLOBAL_READ_PENDING_VEC
or GLOBAL_WRITE_PENDING_VEC) plus the multi-input pipe:

https://git.libre-soc.org/?p=nmutil.git;a=blob;f=src/nmutil/multipipe.py;h=95279250d005ecda96545c9ac3d5aec74ef4d082;hb=22bb7e4a3ec7d5f6139ed0bd427609af33ddafb3#l255

this allows input from multiple sources, and one of those input sources would
be the *other end* of the pipeline, thus creating a cyclic ring buffer that
has two key additional abilities:

1) the ability to "pause" the data being passed down the pipeline (without
   losing any data).  this because each stage is protected by ready/valid
   signals

2) the ability to "accept" extra data, from (multiple) other sources.

examples of (2) include an operand-forwarding capability.  this can be
implemented very simply: the output from the *write* cyclic buffer is connected
also to one of the multi-input ports on the *read* cyclic buffer!


  +------------------------------<-----------<-------+
  v                                                  ^
  |                                                  |
  +-- RD1 --> RD2 --> RD3 ---+    +-- WR1 --> WR2 ---+
  |                          |    |                  |
  ^                          v    ^                  v
  +------<-----------<-------+    +------<------<----+

if we have *multiple* of these cyclic buffers, split to deal with 32-bit
data in 4 banks (HI-32 ODD, LO-32 ODD, HI-32 EVEN, LO-32 EVEN) we can have
the appearance of up to 16R4W for 32-bit vector operations.

not only that, but "crossing over" between these 4 "lanes" is a simple
matter of putting additional inputs onto the front of the buffer.

i'm also going to recommend that we have at least two of these cyclic
buffers per regfile read/write bank.  this because, note:

* RD1 is *only* connected to the broadcast bus linked to all Function Unit's
  Reg-Read1.
* RD2 is *only* connected likewise to all FU's Reg-Read2
* etc.

and if a register was read by RD1 but is needed by RD2, we *have* to pass
it over to RD2 on the next clock cycle... but in doing so we *cannot do a
read on the regfile Port 2* on that next clock cycle, because the Cell for
RD2 is *already going to have data in it* (received from the cyclic shift
of data previously in RD1).

therefore, to solve this, i think we should have two rows.  this gives an
opportunity to always hit the regfile with multiple RDs in every clock cycle,
whilst still also being able to pass data over the broadcast bus(es) to any
listening FunctionUnits.

this does however mean that the columns need to collaborate.  there will be
*two* RD1 Cells vying for access to the Reg-Read1 Broadcast Bus.  there will
be *two* RD2 Cells vying for access to the Reg-Read2 Broadcast Bus.
fortunately, this just needs OR-gates (a 2-in 1-out MUX).

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list