[libre-riscv-dev] [hw-dev] Re: 6600-style out-of-order scoreboard designs (ariane)
Mitchalsup
mitchalsup at aol.com
Thu May 23 00:33:12 BST 2019
Mitch AlsupMitchAlsup at aol.com
-----Original Message-----
From: Luke Kenneth Casson Leighton <lkcl at lkcl.net>
To: Libre-RISCV General Development <libre-riscv-dev at lists.libre-riscv.org>
Cc: MitchAlsup <mitchalsup at aol.com>
Sent: Wed, May 22, 2019 4:55 pm
Subject: Re: [hw-dev] Re: 6600-style out-of-order scoreboard designs (ariane)
On Thursday, May 23, 2019, Samuel Falvo II <sam.falvo at gmail.com> wrote:
I'm terribly sorry for digging far back into the history;
Not at all. Quite recent
I've been trying
to bring myself up to speed on scoreboards and such, and I'm lagging far
behind, in part because I never understood them to begin with and am new to
the technique, but also because I've just returned from a week of vacation.
(Which I already need another one, but I digress.)
Trick I heard, pack overnight bag each person. Hand carry it. If main luggage lost, not end of world. Plus, LEAVE the big bag to unpack the day AFTER arriving. and rested.
On Thu, May 16, 2019 at 10:58 PM Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:
> the FU-FU dependency matrix effectively has a *per FU* variant of the
> [global] read-pend and write-pend vectors which, due to them having
> their own latches, take a snapshot of the read/write-pending state *at
> that specific time*, for *that specific FU*.
>
This is an interesting interpretation which I'll need to meditate on to
fully understand; however, this is the first time I've come to have any
insight on why the FU-FU matrix is necessary.
The latches are triggered by ISSUE just like the FU-Regs one. Thats how they snapshot the state.
This has been a huge
stumbling block for me, as up until the example with ADD r7, r7, ... was
given,
(Mitch it was the same ADD R4 R4 R7, ADD R7 R7 R4 example we found the dependency loop on, for the Function Unit only design)
I'd not understood why it was needed in the first place. Thank you
for framing it in these terms.
No problem
Let me see if I understand correctly:
* the value proposition of the FU-FU matrix is *not* for the benefit of the
issue logic, but for the individual FUs themselves.
Yes. The FU FU Matrix doesnt care about what the regs are, it cares only about RESULTs.
Samuel: consider the case where there are more than one FUs all trying to write to the same register.The reg-reg DM can only record that there are at least 1 dependency on that register. The FU-FU DMrecords that FU[k] and FU[j] both are trying to write to R7 (for example.)
Now, when the consuming instruction is Issued, it takes a snap shop of the current DM and recordsThat there are writes in front of its reads. Then it sits back and waits for the Writes to quit pendingon its reads. In the mean time the FU-FU DM is going to enable FU[k] to write BEFORE it enables FU[j] to write (so writes to the same register occur in program order. AND furthermore, reads to theresult of FU[k] will happen before the write of FU[j].
So, the FU-FU DM is providing the means to keep track of "transitive" dependencies, by renamingthe registers to the unique FU which will produce the desired operand(as a result). Once so renamedthey are easy to track.
As in, it expresses that this FU is creating a RESULT that another FU needs (or the other way round).
The Q Table says where those results go or come from.
Each has a very different purpose and you cant do without both of them.
If the global vectors
all show R7 to be clear, then ADD R7,R7,R4 will happily issue; once issued,
however, now there's a self-inflicted deadlock on R7, preventing the state
machine from issuing GO_READ since it's waiting for something (anything!)
to write to R7. But since there's a write reservation on R7, the issue
logic will never permit a writer to R7 to be issued.
Yyup.
With the FU-FU
matrices, the FU's GO_READ logic only concerns itself with its own cached
write reservation vector, which in this case will *not* have R7 set (since
it's a copy of what existed *before* issue).
I believe so. The clue here was *at issue time*.
Btw the GORD comes in to the FUFU matrix from the LEFT (connected horizontally) and ISSUE comes in from the TOP (connected down vertically).
* the global reservation vectors are *column-wise summations* of the
individual read/write FU-FU matrices,
Confusion
The global reservation vectors, called read pending and write pending, are per REGISTER, and are the column wise summations of the FU REGs matrix.
Not FUFU.
These global reg-bit pending vectors are, yes, only useful in the FU REGs Matrix.
However as we found earlier, with the loop example, the global pending reg bitvector can fire in BOTH rd deps AND wr deps, it has no concept of "time" basically.
The *FUFU* matrix is what stores the concept of time.
Or..
You may be referring to READABLE and WRITABLE which are, yes, column wise sums of the FU FU Matrix rd and write latches respectively.
These sums basically say, for each FU, that it is safe to READ or safe to WRITE the regfile.
However given that there could be several of those, you have to use a priority picker to pick only one to read and one to write.
giving the issue logic the summary of
what's happening as of instruction issue time. Because of this, they are
good enough to determine when to and not to issue to another FU. As it
happens, this is about the only thing they're really good for. All other
timing generation really depends on the FU-FU matrix.
* the individual FUs are only concerned with *their own row of the FU-FU
matrix* and are oblivious to all other rows,
Yes except that the issue signal comes in from the top.
This is really important to appreciate.
controlling timing and
arbitration to the common register file bus(es)
Yes... except it has no idea what the regnums are (and doesnt need to because those are in the other Matrix) Hence why they work together
based on that stored value
(a copy of the global vectors *as of when the instruction was issued*).
Yes.
* Issue logic preventing issue when a (global!) write-after-write situation
is detected ensures the FU-FU matrix effectively represents a DAG,
Right, ok, as-is, the 6600 matrices can't cope with WaW, only RaW and WaR.
So, what you do is, you recall that the FU REGS matrix generates a global per reg bit vector called Global Write Pending?
What you have to do is, BEFORE issuing an instruction, AND that vector against the unary version of the Dest reg, and if it matches, it is a WaW hazard and you have to stall that instruction.
I do have a plan for adding WaW to the design I am working on, it needs the precise exceptions system first, plus QTable history, as WaW hooks into that and allows rollback.
It is complicated and honestly best left. Note that WaW on a ST can be IGNORED. See Mitch book Issue Unit diagrams, chap11 is clearer, chap10 version is based on 6600.
preventing deadlock of the whole mechanism.
Yes.
Put another way, it ensures
that for any bit in the FU-FU write reservation bits, one and only one row
with that bit set exists.
Is this a correct understanding?
Pretty much yeah
Thanks for your patience.
--
Samuel A. Falvo II
______________________________ _________________
libre-riscv-dev mailing list
libre-riscv-dev at lists.libre-riscv.org
http://lists.libre-riscv.org/ mailman/listinfo/libre-riscv- dev
--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev
mailing list