[libre-riscv-dev] KCP53000B micro-architecture thoughts

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sun Jun 2 13:26:22 BST 2019


---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Sun, Jun 2, 2019 at 6:52 AM Samuel Falvo II <sam.falvo at gmail.com> wrote:
>
> On Sat, Jun 1, 2019 at 12:31 PM Luke Kenneth Casson Leighton
> <lkcl at lkcl.net> wrote:
> > samuel:
> > https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/experiment/compldst.py;hb=HEAD
>
> http://chiselapp.com/user/kc5tja/repository/kestrel-3/artifact/87d2fd007e016942
>
> I'm working on an implementation for an add/store unit as well (loads
> not supported since I don't need it for my minimal test case program).
> I'm focusing purely on the computation unit at this point.  It only
> supports GO_SUM and GO_WRITE inputs, since it's assumed that the FU
> itself manages all the other GO_* and *ABLE signals internally.

 you do need an "Issue" signal as a way to "capture" the opcode, and i
found that a single clock delay (sync) was needed on all three
"captures":
 https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/experiment/compalu.py;h=41b73d1156cbfeba300cd4d3b43069f14e3ecc98;hb=HEAD#l80

* issue captures the opcode (command and immediate)
* go_sum (aka go_read) captures operand1 and operand2 (if present)
* go_write triggers the data bus output (which will be a great big OR gate)

if you wanted to do it as DFFs, you could probably do something like this:

# register versions of incoming signals
issue_r = Signal()
go_sum_r = Signal()
go_write_r = Signal()

 with m.If(self.issue_i):
      m.d.sync += issue_r.eq(1) # issue mode (on next cycle)
      m.d.sync += go_sum_r.eq(0)
      m.d.sync += go_write_r.eq(0)
with m.If(self.go_sum_i):
      m.d.sync += issue_r.eq(0)
      m.d.sync += go_sum_r.eq(1) # sum mode (on next cycle)
      m.d.sync += go_write_r.eq(0)
with m.If(self.go_write_i):
      m.d.sync += issue_r.eq(0)
      m.d.sync += go_sum_r.eq(0)
      m.d.sync += go_write_r.eq(1) # write mode (on next cycle)

according to what i've done (and so know it "works"), the opcode is
captured on incoming self.issue_i (line 126):
     latchregister(m, self.oper_i, self.alu.op, self.issue_i)

however the operands are *definitely* captured on the issue->go_sum
transition.  something like:
    latchregister(m, self.src1_i, self.alu.a, ~issue_r & go_sum_r)

note that latchregister will trap (latch) and store its input on a
*combinatorial* examination of its trigger, outputting its input if
the trigger is HI and simultaneously sync'ing that input into an
internal register, and outputting the *PREVIOUS* input (sync'd on the
prior clock) if the trigger is LO.

you could probably do away with the use of latchregister by actually
doing all of the operation and operand capturing using sync.

 with m.If(self.issue_i): # capture opcode on issue with SYNC
     sync += [
    inst_src2_r.eq(self.i_inst_src2),
    inst_imm12_r.eq(self.i_inst_imm12)
    inst_cin_r.eq(self.i_inst_cin)
]

and likewise capture src1 on go_sum_i:
with m.if(self.go_sum_i):
     sync += [
          trunk_data1_r.eq(self.i_trunk_data1),
          trunk_data2_r.eq(self.i_trunk_data2),

*now* you can use the inst_src2_r NOT the self.i_inst_src2 in the
decision-making (m.If(inst_src2_r == SRC2_IMM12)), likewise the sum is
done from the *registers*:
     comb += sum.eq(self.trunk_data1_r + addend2 + i_inst_cin_r)[0:xlen]

likewise for the output, except here you can just use the go_write_i
as a direct decision to sync the sum onto the output bus.  *yes* sync,
however note, because it's going to be a big OR bus, if go_write_i's
not set, output zero.

with m.if(self.go_write_i):
     sync += self.o_trunk_data.eq(sum)
with m.Else():
     sync += self.o_trunk_data.eq(0)

or just:
   sync += self.o_trunk_data.eq(Mux(self.go_write_i, sum, 0))

remember though that you *need* that "busy" signal, which would be done as:

with m.If(self.go_issue_i):
    sync += self.busy_o.eq(1)
with m.If(self.go_write_i):
    sync += self.busy_o.eq(0)

busy is set 1 clock after issue, busy is CLEARED 1 clock after write.

the triple-way signalling is essential, samuel, and it's essential
that it be done on a clocks' delay.  otherwise you get into horrible
combinatorial loops that are a bitch to track down.

> For summation type instructions, the sequence of states and strobes is
> expected to be ISSUE, READ_PENDING (terminated by GO_READ), GO_SUM,
> WRITE_PENDING (terminated by GO_WRITE), in that order.

 remember that ISSUE sends back a BUSY signal, and i found that all
strobes (BUSY, RD_PEND, WR_PEND) *must* do so on the clock cycle
*after* the change in signal that caused the state transition is
asserted.

 also, i recommend using the names "REQUEST_READ" and "REQUEST_WRITE",
because READ_PENDING and WRITE_PENDING are signal name terminology
used in the Dependency Matrices.

 if you use the same names in the Computation Unit, i guarantee that
things will get horribly confused, not least because when you start to
describe use of "READ_PENDING" i will be forced to ask, "which
READ_PENDING did you mean, did you mean the one from the Dependency
Matrix or did you mean the one from the Computation Unit?"

 so it would be ISSUE (BUSY raised on next clock), REQUEST_READ
(terminated by GO_READ), GO_SUM, REQUEST_WRITE (terminated by
GO_WRITE)

> For store
> instructions, it'll be ISSUE, READ_PENDING (terminated via GO_READ),
> GO_SUM, CMD_PENDING (terminated by bus), ACK_PENDING (terminated by
> bus).
>
> Although incomplete, this CU is capable of supporting ADD, ADDI,
> AUIPC, SUB, ADDW, ADDIW, and SUBW instructions (the latter 3 requiring
> assistance from the register file for 32-bit sign extension).  It just
> needs an FU smart enough to know how to drive its various inputs and
> when.

 remember that if you have the sync in the CU, the FU can drive the CU
*combinatorially*.  and that the Dependency Matrices can also be
combinatorial, *and* the issue unit can check the Busy and Issue
signals combinatorially.... *and the register trunk can be driven
combinatorially*....

 *all because the Computation Unit's latches are driven by sync*.

 to drive that home, take a look here:
https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/experiment/score6600.py;h=4c55e0bddbdb08dfa56b6747c67f31bfa0b1bd75;hb=HEAD#l1

except in one area where i am still experimenting (branch), there's
not a *single explicit use of sync* ANYWHERE in the top level.

> Despite its simplicity and incomplete status, it takes my computer
> about 3 minutes to fully formally verify the design as it currently
> stands.

 holy cow.  well, do investigate pypy3.  i haven't been able to use it
yet as the beta 7.0 gave a spurious exception.

> My next steps with my unit is to implement a basic FU for supporting
> the summation instructions.
>
> Once *that* is done, then I want to complete the FU logic for driving
> a TileLink port, which in turn will inform and complete the CU logic
> as well.

 awesome.

l.



More information about the libre-riscv-dev mailing list