[libre-riscv-dev] KCP53000B micro-architecture thoughts

Sun Jun 2 17:18:55 BST 2019

On Sun, Jun 2, 2019 at 5:27 AM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>  you do need an "Issue" signal as a way to "capture" the opcode, and i

In my microarchitecture, that's a function of the FU; the i_inst_xxx
inputs on the CU will be tapped off the registered instruction fields
in the FU.

> * go_sum (aka go_read) captures operand1 and operand2 (if present)

go_sum is the clock-delayed version of go_read in my design.  I
created this timing diagram ( https://tinyurl.com/yyxeem4z ) to help
illustrate how an ADD or SUB instruction would flow through the unit,
and includes both the CU and FU signals as I understand them.  I also
incorporated your recommended name changes.  (Scoreboard signals
aren't present because I haven't gotten that far yet.)

I'm wondering about naming consistency though; right now, GO_SUM is a
stage in the state machine, while GO_READ and GO_WRITE are basically
acknowledgements.  I'm thinking of renaming GO_SUM to SUM_REQ.

In other words, signals in the form of GO_XXX are always
acknowledgements from an arbiter, XXX_REQ are always requests from the
FU, and XXX_ACK are always acknowledgements from the CU (but, only if
required; in the timing diagram above, I didn't need any since we're
just producing a sum.  For memory interface, we'd need two ACKs, one
each for the address and data phase of the interconnect).

>  with m.If(self.issue_i): # capture opcode on issue with SYNC
>      sync += [
>     inst_src2_r.eq(self.i_inst_src2),
>     inst_imm12_r.eq(self.i_inst_imm12)
>     inst_cin_r.eq(self.i_inst_cin)
> ]
>
> and likewise capture src1 on go_sum_i:
> with m.if(self.go_sum_i):
>      sync += [
>           trunk_data1_r.eq(self.i_trunk_data1),
>           trunk_data2_r.eq(self.i_trunk_data2),

Bingo; I'm building my design so as to not use SRFFs.  It makes things
somewhat more challenging timing-wise, but it's friendlier to the
smaller, simpler logic-cells that the iCE40HX family uses.

> remember though that you *need* that "busy" signal, which would be done as:

Because BUSY is part of the state machine that drives the whole thing,
that will also reside in the FU logic.  As I conceived things, the CU
is pretty dumb; largely ignorant of the state machine details in the
FU.  This gives me the freedom to re-use the CU with a different FU
implementation.  Looking at the timing diagram linked above, you'll
notice that I can (should I desire) release BUSY concurrently with the
assertion of GO_WRITE, which would allow another instruction to begin
processing concurrently with the register write-back, provided no
hazards were encountered.

> except in one area where i am still experimenting (branch), there's
> not a *single explicit use of sync* ANYWHERE in the top level.

As a consequence of my design using a fully synchronous logic (and
always on the rising edge of the clock at that), both the instruction
dispatcher and the FU will rely on state machines to some extent, so
they'll need some synchronous elements.  From what I've seen, though,
it won't be that much.  The gated ring counter used to drive the state
machine in the FU is a tiny bit of logic; the vast majority of it will
be combinatorial in nature.  Nonetheless, the goal is to pack as much
semantic load into the FU as possible, since the FU is the only place
that encapsulates knowledge of how to operate the CU.  My vision for
the scoreboard logic, then, will encapsulate only the /globally/
relevant information (e.g., the stuff that involves multiple FUs).

>  holy cow.  well, do investigate pypy3.  i haven't been able to use it
> yet as the beta 7.0 gave a spurious exception.

I don't mind the time taken so far; I think the FU logic will be much
easier for it to verify since it'll be more stateful.

-- 
Samuel A. Falvo II