[libre-riscv-dev] [hw-dev] Re: 6600-style out-of-order scoreboard designs (ariane)

Luke Kenneth Casson Leighton lkcl at lkcl.net
Thu May 16 08:25:17 BST 2019


hi mitch,

i believe i finally have a (genuine) case (as opposed to "a coding
bug") as to why the FU-FU Matrix is needed, however this could just be
a misunderstanding given that i remember you mentioned earlier a need
for a staggered (clock-delayed) propagation of the global vectors
(can't remember precisely which).

consider the following two instructions, for which there are two
separate Function Units, and assume a 4 (or greater) cycle completion
time:

ADD r7, r3, r4
SUB r4, r7, r5

note that r7 is used as the src and dest in both, and that r4 is used
as the dest and src in both, in that order.

* ADD is issued to FU 1.  all global read/write vectors are zero at
time of issue.

* however in the next cycle, the global write-pending for r4 is set,
and the global read-pending for r7 is set (and on r3)

* *AT THE SAME TIME*, the SUB is issued to FU 2.  it correctly detects
a RaW hazard on r4, however it also *INCORRECTLY* detects a WaR hazard
on r7.

this results in a total lockup because the SUB now believes it has a
WaR lock on the ADD, and the ADD has (correctly) a lock RaW lock on
the SUB.

if however the issuing of instruction 2 is delayed by even a single
clock cycle, that gives instruction 1 sufficient time to read r7 (and
r3), to then drop the read-pending in sufficient time for the SUB to
be issued *without* the erroneous WaR hazard.

what the heck am i missing here?

l.



More information about the libre-riscv-dev mailing list