[libre-riscv-dev] KCP53000B micro-architecture thoughts

Sat Jun 1 04:36:20 BST 2019

On Sat, Jun 1, 2019 at 3:23 AM Samuel Falvo II <sam.falvo at gmail.com> wrote:

> Due to the size of the register file, it looks like I will not be
> using unary representation except where absolutely necessary.  Plus,
> the block RAM that will implement the register file only accepts a
> binary representation.

 iknowiknowiknow, it's the one thing i don't like, and there are a
couple of reasons why the big soc has to stick with unary:

 (1) for multi-issue you have to OR the *unary* dependencies together
in a chained (transitive) fashion, from the 1st instruction being just
1 bit set, the 2nd instruction in the group will have (up to) 2 bits
set, the 3rd (up to) 3 bits set and so on.

 (2) there's some SIMD stuff we'll be doing where it is easier if
stuff is in unary.

> > Pretty much, yeah!  Btw LD/ST needs a bit more attention, there is a
> > separate Matrix for it (in addition to the FU which does the actual reg
> > adding to calc the address).
>
> I was planning on building the LD/ST unit so that it did its own
> effective address calculation.  So, instead of taking 3 or 4 cycles to
> complete, maybe it'll take 5 (assuming no external wait-states).  So,
> the sequence would be issue, go-read, add, drive-bus, then go-write.

 i literally (as you can see from my previous post just now) just
designed a "ST Computation Unit" that does exactly that.  rather
unnecessarily, it happens to be able to select (based on the opcode)
between an immediate or a 2nd src register.

 for RISC-V there *are* no 2-reg LD ops, however if you are planning
to do a hybrid "can be used as an ADD unit as well" thing, the
capability of that diagram will be pretty much what you'll need.

 you *do* need corresponding "i want to... X" lines for every one of
those go-read, drive-bus and go-write signals.  it is necessary for
the CU to make the *request* for the action (based on the status of
the *previous* phase going HI), then when the action comes in, lock
the corresponding register-latch and that happens also to activate the
next "phase".

you can see that happening in the diagram:

* Issue will close the opcode latch and OPEN the operand latch AND
trigger "Request_Read" (and set "Busy")
* Go_Read will close the operand latch and OPEN the address latch AND
trigger "Request Address".
* Go_Address will close the address latch and OPEN the result latch
AND trigger "Request Write"
* Go_Write will close the result latch and OPEN the opcode latch, and
reset BUSY back to OFF, ready for a new cycle.

so it's basically a ripple-loop pipeline of READY / VALID signals.

> Since it has an adder as part of its logic, it could short-sequence
> that whole sequence for adds and subtractions: issue, go-read, add,
> then go-write.

 yes.  cut out the address/bus part.  should kiiinda be easy enough to
do, he said...

> > P.s you know I am going to borrow that when it comes to the engine I am
> > doing :)
>
> It's open source and under MPLv2 license.  Please enjoy.  :)  It's not
> required, but if you can give a brief reference back to the Kestrel
> Computer Project as your source of inspiration for that somewhere in
> your literature, that'd be appreciated.  :)

 sure, more than happy to.

> One question that might come up over time -- why is TX so much easier
> to use than RX?

thx for heads-up.

l.