[libre-riscv-dev] [Bug 313] Create Branch Pipeline for POWER9

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri May 15 21:31:18 BST 2020


https://bugs.libre-soc.org/show_bug.cgi?id=313

--- Comment #11 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Michael Nolan from comment #10)

> I removed the TAR input, because the branch instruction would need to decide
> which of the 3 inputs to branch to. Instead, it should be fed that operand
> in the spr input (though that does mean it will be fed ctr twice for
> bcctr...). Does that seem reasonable?

nggghhh.  *thinks*, *thinks*....

ok, the way that the Function Units work is: well, let me find a diagram..
https://libre-soc.org/3d_gpu/comp_unit_req_rel.jpg

although that's an ALU FU, in the case of a *branch* FU, those inputs would
be:

* CTR
* CIA
* CR
* TAR

basically, those latches at the top "accumulate" all of the inputs needed
(some of which as you can see actually come from the operand - this is
CompBROpSubset)

anything that needs to be read from a regfile - whether it be the SPR one,
the Condition Regfile or the INT Regfile - all needs to have those pairs
of REQUEST/GO_READ signals.

* the outgoing one *requests* the register (actually the regfile port)
* the incoming one (GO_READ) demands that the FU latch the incoming register
  data, right now.  it will not get another chance.

so the idea of doing time-synchronous feeding of the input data (sharing one
of the register inputs for two registers, one after the other) is massively
more complicated than simply storing the data in (multiple) Function Unit
latches.

now, *at the regfile* there may only be one available Reg Port Bus.

therefore *over the (one) Reg Port Bus*, it may take two clock cycles:
* one to read CTR
* one to read TAR

and this can be done in any order: CTR first, TAR first, or if 2 Reg Port
Buses happen to be available, *both* simultaneously.

however that task is handled at the CompUnit (like in the diagram above),
precisely so that the pipelines *do not* have to have such complexity.

therefore, really, if the branch op needs 2 regs (one CTR, one TAR), then the
BranchCompUnit will need 2 latches to cover them, and consequently
BranchInputData
needs to have both ctr and tar.

let's have a look at what was generated by the parser.  all the others are
subsets (less regs) than this one:

    @inject()
    def op_bctarl(self, CTR, CR, TAR, LR):
        global NIA
        if mode_is_64bit:
            M = 0
        else:
            M = 32
        if ~BO[2]:
            CTR = CTR - 1
        ctr_ok = BO[2] | ne(CTR[M:64], 0) ^ BO[3]
        cond_ok = BO[0] | ~(CR[BI + 32] ^ BO[1])
        if ctr_ok & cond_ok:
            NIA = concat(TAR[0:62], SelectableInt(value=0x0, bits=2))
        if LK:
            LR = CIA + 4
        return (CTR, CR, TAR, LR,)

that's interesting.  TAR is returned but i do not see it being modified.
that's a bug.

NIA however is calculated but is not returned *from* the... oh wait, yes
i see, it's a global.  ok.  got it.

so, yes, apologies, we can't use spr for dual-purpose, we need 2 regs,
and they might as well be named ctr and tar, rather than spr.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list