[libre-riscv-dev] [Bug 296] idea: cyclic buffer between FUs and register file

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri May 1 21:23:37 BST 2020


--- Comment #8 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #7)

> That's true for correctness, but not for performance. Delaying 20 cycles on
> every forwarding operation would seriously reduce the performance, probably
> by at least 10x, so, obviously, that should be avoided.

sorry, i meant, "delayed not deliberately but because the total bandwidth
(paths) between FU-result latches and FU-input latches is limited".

whether that path is via the regfile or via a forwarding bus doesn't matter:
if the forwarding bus isn't available for 5-10 cycles, there's always the
write-port on the regfile.

Mitch's analysis was, i believe, that we should expect a ratio of... what
was it... 1.3 reg reads to writes?  i.e. each result produced by one FU is
"consumed" by an average of 1.3 other FUs.

he noted that, therefore, even if the forwarding bus is used just once,
forwarding to just one other FU before the value is dropped on the floor,
in a cascade it actually saves a lot of regfile activity.

unfortunately, to reduce this regfile pressure assumes that you can detect
write-after-write and also "discards".  that *is* possible, however the
discussion was now over a year ago, and we simply haven't got time in
this round to make the necessary DM modifications.

so what we will gain with a forwarding bus is: decreased latency.  this because
with the register file being a bottleneck, even one extra bus (fwd) means that
some of the operations that would otherwise need to have waited for the write
(and followup read) through the regfile, instead may start that much earlier.

yes, ultimately: if access to that forwarding bus (which _will_ itself be
resource-contended) can be done early (same clock cycle, even), obviously that
latency is decreased much more.

summary: for a 6600-based DM system it's strongly desirable for the fwd
bus to be same (or 1 clock) cycle: it's not however a make-or-break mandatory
requirement.  for an in-order system, on the other hand, it's not only
far more difficult to *add* a forwarding bus, it is, if i recall correctly,
timing-critical.  i.e. the consequences of missing the slot are that the
entire pipeline (right the way back to issue) is forced to stall.

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-riscv-dev mailing list