[libre-riscv-dev] store computation unit

Thu Jun 20 17:22:45 BST 2019

On Thursday, June 20, 2019, Mitchalsup <mitchalsup at aol.com> wrote:

thinking out loud: the two key differences i perceive (there may be more)
> between pick-and-go-rd/write for register file read/write and forwarding
> are:
>
> (1) you need to identify the locations (and corresponding matching
> registers) when one (or more!) FUs have read-pending, and *another* FU has
> request_release pending.  if i am reading it correctly think this is the
> subject of the diagram that you designed (side-note: this is an opportunity
> for *broadcast* of the FU result to *multiple* recipient FUs)
>
> At issue, one determines (captures) all of the writes pending on any of
> their operands not just by register number, but by function unit number and
> result number on a per operand basis.
>
> (2) even if the FU result has been forwarded, it is critically important
> not to drop that result (as would be done with a regfile write), because it
> could still be necessary to write it to the regfile.
>
> The logic I am aiming at works as follows::
> 1) if an operand can be read out from the FU.result[i] this is forwarding
> 2) if the FU.result[i] gets a GoWrite.FU.result[i] then:
> 2.a) on this cycle one can still access the result by forwarding
> 2.b) on all subsequent cycles one reverts to accessing the operand from RF.
> 2.c) FU.result[i] becomes EMPTY when a new instruction is issued into that
> FU.
> 2.d) we cannot forward from an EMPTY location and must revert to RF.
>

Ah. Minor concern: if there are say 25 FUs, all of which are permitted to
forward simultaneously to all 24 other FUs, the forwarding could result in
a massive 64 bit 25-way multiplexer.

So this was why I was thinking of treating the forwarding as being
identical to a single regfile port, even to the extent of giving each its
own Priority Picker.

identifying the circumstances under which it is possible to drop the result
> was the subject of that "nameless" discussion we had back in
> november-december, and it involved detecting of shadowing (any outstanding
> potential cancellation opportunities, whether they are branch shadows,
> exceptions, or WaWs) as well as detection of remaining read hazards.
>
> for any FU result that is in the "forwardable" condition (that has not
> yet been given an opportunity to write to the regfile) *only* when both the
> last shadow *and* the last read hazard on that FU has been dropped, is it
> then safe to simply drop the FU result on the floor.
>
> Grasshopper:: One can forward when the result is still under the shadow of
> a pending conditional branch!
> This is because all consumers are farther down the instruction stream and
> if this instruction dies because of a shadow, so will all consumers!
> Forward because the result is ready, not read because
> the RF does not have the value yet, and, indeed, is not allowed.
>

Ah good point, however it is a potential misunderstanding of the purpose of
what I wrote.

The scenario that illustrates best is: say that an FU managed to forward
its result to all other FUs waiting for that result. Those FUs complete and
commit, and let us suppose that one of them is a WaW that OVERWRITES the
very same register that the original FU was waiting to write to, but never
did, because it used Forwarding, instead.

That FU now is in the really weird position of *not even needing* to write
its result to the regfile.  Doing so just wastes CPU cycles, as no FU needs
that result, because they *already had it*.

This, as an optimisation, to detect these situations, make the earlier WaW
candidates "nameless" such that they don't even hit the regfile, was the
topic of the discussion back in january.

It needs forwarding, in order to work.

>
> if forwarding is never added, the above situation (2) does not even enter
> into the equation.  everything goes via the regfile as the arbitrator.
>
> That is right, and that is why the forward stuff has to be done
> independently--but it comes with more degrees of <scheduling> freedom.
>
> creating the multi-level priority picker, whilst i appreciate it is a
> perfect candidate task as a 1st level engineering exam question, i cannot
> think of a way to create one without it being recursive in nature.
>
> The new picker considers instructions with forwarding operands in front of
> those which read registers.
>
>      nextExecution<0:k> = PICK( { forwardable<0:k>, readable<0:k> } );
>

Ok so forwarding is a higher priority than reading..

Hmmm that just makes a single selection from a large group, it doesn't
allow for one forwarder to be picked *and* one readable to be picked (just
not the same as the forwarder).

>
> a single priority-picker is needed to protect a single resource (regfile
> port, whether read or write)
>
> A single picker that is now twice as many bits wide (and 1 gate longer)::
> see the book chapters I sent.
>

Oh? Ah HA! :)  ok will take a look.

>
> the multi-level priority picker is needed to be able to allow *multiple*
> ports, whether they be forwarding or actual-regfile.
>
> * the "Highest Priority Level Picker", clearly, if activated, must *stop*
> the 2nd level picker from selecting the exact same readble (or writable)
> signal.
> * the 2nd level priority picker, clearly, if activated, must stop the
> *3rd* level priority picker from picking either the 1st *or* 2nd level
> picker readable (or writable) signal
> * and so on.
> See the book chapter.
>

Will do.

> this to give multi-port reading and writing... *and operand forwarding
> opportunities*.
>
> the only thing is:
>
> (3) whilst operand forwarding is a great opportunity for broadcasting of
> FU result to pending CU input latches, a design that makes it necessary to
> have *TWO* (or more) register reads ready before the forwarding (or regfile
> reading) can proceed... this is a *lost opportunity*.
>
> Which is why we need to back up the point where we can begin execution to
> RequestRelease and not wait for GoWrite.
>
> it would i feel be far better to have Go_Read_Operand_1, Go_Read_Operand_2
> and to have corresponding completely separate Priority Pickers (more to the
> point: multiple multi-level priority pickers) for each separate and
> distinct operand.
>
> That is, in effect, what I am working on.
>

Ah excellent.

> then, two things happen which differ from the 6600:
>
> (a) the regfile read bus (each read port) can become a *broadcast* bus,
> with each operand (each Computation Unit latch) being independently ready
> to receive operands if their read is pending.
>
> (b) errr... i forgot :)  too focussed on writing (a).  something to do
> with forwarding, which i probably covered above already.
>
> The question becomes::
>
> By which mechanism will I read out results and capture as operands.
>
> I think you will want to organize the FU.result vectors as a single result
> file and have a certain number of busses back to the operand latches. It is
> these busses that the forward picker will schedule.
>

My understanding is that there has to be forward pickers plural.  If the
intention is to make the forwarding bus capable of transporting say up to 5
operands, there needs to be 5 pickers, each capable of not accidentally
picking the same operand to forward.

> all that having been said (as context), i believe you are onto something
> with this design.  i had envisaged that, perhaps, it might be necessary to
> identify the precise match of FUs that are in the "request_release" state,
> and match those exactly with those that have a read pending.
>
> Still working on this part.
>

I believe it is a non-issue (trivial), as the pending read vector is global
in nature.

It is not like it is necessary to do a full N to N crossbar Forwarding
system.

Hence why I was picturing Forwarding as being a kind of regfile port suite
without the actual regfile, and thus having cascading priority pickers a la
multi issue.

L.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68