[libre-riscv-dev] [Bug 101] IEEE754 pipeline "go_die" (Computation Unit Cancellation) needed

Fri Jun 28 07:54:26 BST 2019

http://bugs.libre-riscv.org/show_bug.cgi?id=101

--- Comment #2 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #1)
> all that's needed is a flip-flop chain that marks if the matching pipeline
> stage has valid data or not. No need to modify the data in the pipeline,
> since that takes up more gates and that pipeline slot can't be used again
> anyway.

the data's not important: the muxid is.

what you suggest that has the unfortunate unintended side-effect of making
the muxid impossible to use (immediately, either in the same or a subsequent
cycle).

to workaround that, it becomes necessary to add extra fields to muxid
to say "hey i am a valid muxid where the previous one wasn't", errrr...

or, you simply extend the number of Reservation Stations (and FUs)
to *DOUBLE* the length of the pipeline, such that the "cancelled"
items don't block things up.

that in turn creates far more gates than adding "cancellation of muxid"
ever would.

for FPDIV we're already going to need a *massive* number of Function Units,
otherwise we get a processing backlog at the FU / Reservation Stage.

in a Concurrent Computation Unit design (pipelines with fan-in and
fan-out, basically) if the number of FUs / Reservation Stations is
less than the pipeline length, there is not enough inputs/results
"storage latches" to match "data in the pipeline".

* 3-wide fan-in, fan-out means 3 FUs
* 4-long pipeline can process 4 sets of operands and produce only
  one result per cycle
* 5 pieces of data need processing
* the first 3 sets of operands go into the 3 FU "Reservation Stations",
  no problem.
* the 4th *STALLS* the *ENTIRE* engine - and i do mean the *ENTIRE*
  instruction issue stage, freezing *ALL* further processing - for
  one clock cycle, waiting for one of the 3 FU "RSs" to become free.

so if a muxid is not cancelled, you have to *wait* for that MUXID to
pop out the result end, at which point you go, "oh!  this result
was actually cancelled a few clocks ago, but thank you for letting
us know: *now* we can drop the "busy_o" signal on this Computation Unit:

https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/experiment/compalu.py;h=7da6b5cf9fa06b9bfa766f769ac19f4f49caf90b;hb=734d6ca4e4a4f6ea3d6038a54e50de5d76d9618b#l64

and *only* when busy_o is dropped can the engine continue.

it's just one of those things about 6600-style scoreboards: the tomasulo
algorithm works around this by making the Reservation Stations multi-entry,
however the penalty for doing so is that the entire scoreboard (now a ROB
in Tomasulo) has to use a CAM, in order to differentiate these multiple
entries.

bottom line is: being able to do global (immediate) cancellation of
the muxid is really quite important.  the *data* is actually not
as important, because without the muxid the data isn't routed at
the Fan-Out stage in the ReservationStation class.

https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fpdiv/pipeline.py;h=7c130fae0539a48c235b331f70b8e3f451da0eb6;hb=e47839fad581624f248e13b05b2ca9d9975d2f95#l36

-- 
You are receiving this mail because:
You are on the CC list for the bug.