[libre-riscv-dev] store computation unit
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Mon Jun 3 16:04:14 BST 2019
On Monday, June 3, 2019, Mitchalsup <mitchalsup at aol.com> wrote:
> Mitch Alsup
> MitchAlsup at aol.com
> -----Original Message-----
> From: Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> To: Libre-RISCV General Development <libre-riscv-dev at lists.libre-riscv.org>;
> MitchAlsup <mitchalsup at aol.com>
> Sent: Sat, Jun 1, 2019 9:08 pm
> Subject: Re: store computation unit
> hiya mitch,
> thinking through the LD/ST Dependency Matrix, the general idea being
> that LDs hold up STs, and STs hold up LDs, i am puzzled by the
> (original) drawing that can only activate the WaR or RaW latches if
> and only if the exact same unit raises *both* LD *and* ST,
Got it. This is renarkably similar to the problem I ran into on the FU-FU
Matrix, where ADD r1 r1 r1 would cause the FUFU Matrix to create both a
spurious RaW *and* WaR hazard... on itself.
I dealt with that by blanking out all FUFU entries down the unity diagonal.
Given that the exact same logic applies, the triple input AND gate
effectively makes the setting of those SR Latches flat-out impossible.
consequently I believe that all the gates (not the turn points) along the
diagonal can be completely removed.
> so i looked at the FU-FU read/write dependency matrix cell, and it
> occurs to me that the LD/ST Dep Cell is missing a corresponding "read
> pending" and "write pending", which in this case would be called "Any
> Load Pending" and "Any Store Pending".
> if however those are global signals, we get the same "deadlock"
> problem that the FU-FU matrix is designed to prevent.
> so i have a sneaking suspicion that a full NxN matrix is needed,
> basically a duplicate of the FU-FU matrix, each row of WaR/RaW latches
> preserving the dependency (instruction) *order* (based on the Issue
> however.... even here, we run into difficulties. imagine this
> sequence, assume that all of these remain "pending":
> * LD
> * LD
> * LD (no STs so far, all good - "Any Load Pending" remains HI)
> * ST (ok! LD-Pend HI, so the hazard goes HI, and ST-Pend also now goes HI)
> * ST (still ok, this ST will still block, ST-Pend also now still HI)
> * LD.... whoops.
> LDs are dependent on preceding STs,
> STs are dependent on preceding LDs and STs.
> Address mismatch relaxes dependencies.
> Initializing the DM
> * 1: LD // not dependent on anyone
> * 2: LD // not dependent on anyone
> * 3: LD // not dependent on anyone :: RAR is not a hazard
> * 4: ST // this ST is dependent on the previous 3 LDs
> * 5: ST // this ST is dependent on the previous 3 LDs and the previous
> * 6: LD // this LD is dependent on the preceding 2 STs.
> Now when the addresses are AGENed, the LOB(address) are compared to the
> LOBs(other addresses) and a mismatch relaxes the dependency, a match
> continues with the dependency. Actual progress on an operation removed its
> dependence on all younger MemRefs.
Sigh I spent the entire morning (ok afternoon) on the leather sofa, staring
for hours at a hand drawn version of the LDST Matrix.
I realised, firstly, that it *is* a matrix. In other diagrams I misread
them and believed that it was a sparse matrix, with cells only present down
As a FULL 2x2 Matrix the LDST cell makes more sense. Its job is, in a
given row, to capture a snapshot at LD/ST issue time (hence the inclusion
of the Issue signal into the AND gate), of this LD/ST against all past and
outstanding other LD/STs.
With the snapshot, then detection of LD/ST dependence is just a matter of
seeing in any cycle which one has no LDholdsSTs or STholdsLDs.
Hm, there may be a bit more to this, the Matrix might get stuck, how to
detect which is the very first LD (or ST) that is allowed to proceed, if no
LD/ST has yet activated any "hit" signals, it cannot generate the HOLD
signals. Have to think that one through.
Mitch when you say "STs are dependent on preceding LDs and STs."
I am slightly confused because in the diagram the cells are symmetrical,
detecting WAR and RaW...
Ah wait, ST depends on ST through WaW hazards, and that is solved not
through the LDST Matrix but through the instruction order "shadowing" that
I added, which creates full WaW dependence. Does that sound about right?
> that last LD is where it goes haywire, because if it is even allowed
> to issue, with LD-Pend being HI *and* ST-Pend being HI, deadlock is
> A MemRef is only dependent on older MemRefs, and only dependent
> until they AGEN and mismatch or AGEN match and make forward progress.
I misunderstood the Matrix, I thought it was sparse. whoops.
Now this makes sense. AGEN match I had an idea to use a 16 bit bitmap to
match the lower address bits and use the truncated idea (bits 4 to 15) for
the rest, ignore above bits 15.
Does that sound reasonable, or excessive? Straight compare on bits 4 thru
10 instead perhaps?
> the simplest thing to do under these circumstances would appear to be
> to stall the entire issue process. it's a little tricky because the
> sequence could be LD-LD-LD-ST-ST-LD *or* it could be
> ST-ST-ST-ST-LD-LD-LD-ST, both transitions would result in both LD-Pend
> *and* ST-Pend going HI.
> what's going oooon? is it really this tricky?
> It is not tricky at all, you just have to have a clear head about which
> dependencies point (younger can be dependent on older but not the other
> around), and which orders can be dependent (RAR is not a hazard)
In my case an afternoon alternating between staring at graph paper and
staring at the ceiling was a close substitute for a clear head...
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev