[libre-riscv-dev] [Bug 305] Create Pipelined ALU similar to alu_hier.py

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Mon May 11 19:15:26 BST 2020


https://bugs.libre-soc.org/show_bug.cgi?id=305

--- Comment #43 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Michael Nolan from comment #42)

> I went through the tables and double checked, the *only* ALU instructions
> that have 3 inputs are 2 registers and an immediate. It does not *need* 3
> register inputs. 

yehyeh.  ok.

really... normally, what would happen is that anything that's significantly
different like this would get put into its own Function Unit (into its own
ALU Pipeline).

then, there's no "switching about at the ALU side"



> However, some piece of hardware needs to order the inputs, I guess the trade
> off is whether to do it in the alu or somewhere else.

i think one of the reasons why processors go with a convention "B is
reg or immediate" is to save wires.  you've got 64 bits worth of wires for
B, B is not used all the time, so when it's not, use it to farm in
the "immediate" data.

so i think we'll find... 1 sec let's look at the original decode2.vhdl...

ah yes:
https://github.com/antonblanchard/microwatt/blob/master/decode2.vhdl#L89

ok so that's definitely putting the *actual* immediate data into reg b.
decode_input_reg_b() is where the decision is made to either put the
regfile data into the return result or put the immediate in it.

so given that we're using microwatt's "decoder" we really need to
follow the same conventions.  and here's where the execution has
*three* inputs:

https://github.com/antonblanchard/microwatt/blob/master/execute1.vhdl#L61

a_in, b_in, c_in.

so my feeling is we:

* have the ALU with the same 3 inputs (even though we know only 2 of them
  are used), and keep the same order

* where the immediate is availabe assume it is always dropped into the b_in
  64-bit wires

thus:

* remove the MUX from ALUInputStage (that means in the unit tests
  it has to be done there)

        with m.If(self.i.ctx.op.imm_data.imm_ok)
            comb += self.o.b.eq(self.i.ctx.op.imm_data.imm)
        with m.Else():
            comb += self.o.b.eq(self.i.b)

so, in set_alu_inputs it would be:

    inputs = []
    # C (or A?)

    reg3_ok = yield dec2.e.read_reg3.ok
    reg3_sel = 0
    if reg3_ok:
        reg3_sel = yield dec2.e.read_reg3.data
        reg3_sel = (sim.gpr(reg3_sel).value
    inputs.append(reg3_sel, reg3_ok)

    # B (or imm)
    with m.If(self.i.ctx.op.imm_data.imm_ok)
        reg2_sel = yield self.i.ctx.op.imm_data.imm)
    with m.Else():
         reg2_ok = yield dec2.e.read_reg2.ok
         reg2_sel = 0
         if reg2_ok:
            reg2_sel = yield dec2.e.read_reg2.data
            reg2_sel = (sim.gpr(reg2_sel).value
    inputs.append(reg2_sel, reg2_ok)

    # A (or C?)

and then still have 3 inputs, and in this way we save 64 bits
worth of wires because B is used to input the immediate.

one other key reason for doing it this way is because the FunctionUnit
front-end is where the decisions are made about reading from the
regfile (and acknowleding them).  so it has to decode the Operand, there,
anyway, because it must not even send a *request* for Register B if
it does not actually need it.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list