[libre-riscv-dev] KCP53000B micro-architecture thoughts

Sat Jun 1 00:15:46 BST 2019

On Saturday, June 1, 2019, Samuel Falvo II <sam.falvo at gmail.com> wrote:

>  such that, once set/altered, *copies* (read-only copies) of that
> > state (CSR) may safely be made in, for example, latches, that are used
> > in various other places.
>
> Yep, I came to that conclusion too.  Additionally, especially when
> day-dreaming about the 6502 refit, I also came to the conclusion that
> nothing forbade the use of a dependency matrix (or its equivalent) to
> protect individual sub-fields within a single register either.

>
*puzzled... lightbulb....* oohhh neat ideaaaa.

Ooo so, so, for the CSRs, actually you split them into purposes. It's just
that they're stored in what APPEARs to be the same "register", but actually
that's not true, the register is only a few bits.

> E.g.,
> on the 6502, nearly all instructions have a write dependency on the
> flags register, but very few have read dependencies.  Those that DO,
> however, often only care about specific flags.  For example, ADC and
> SBC care only about the carry flag, not zero, overflow, or negative.
> So, it makes sense to me to reflect that in the design of the
> dependency matrix where it makes sense to.

Niiice.

>
> >  so for example if the LD/ST Unit has some sort of "global state"
> > which is affected by a CSR (the virtual memory section, involving
> > ASIDs for example), which it could well have if it has been designed
> > in a particular way, and you fail to take that into account in the L2
> > cache or something, then yes, things will go horribly wrong.
>
> Perhaps this is an application of shadowing.  I've been putting more
> thought into the use of shadow flags to impede instruction issue,

Impede instruction issue... there are two interpretations of this.

1 is potentially a misunderstanding. Shadow does not DIRECTLY impede issue.
Shadow impedes COMMIT, such that Comp Units cannot raise req_release, such
that the picker cannot raise go_write, such that the Function Unit remains
"busy", such that instruction issue will, if there happen not to be enough
UNbusy FUs to deal immediately with the current instruction, the issue has
no choice but to block until an FU is free.

 So it is very indirect but technically correct.  The other meaning is

2. There is a genuine dependency that instruction ISSUE relies on
information from eg a CSR in order to proceed, and that CSR is to be
written to.  This would GENUINELY hold up instruction issue until the write
had occurred, and that write could itself be shadowed by already-issued
instructions.

In SV we may actually have to do this (cannot say I am looking forward to
it!) with respect to Vector Length.

VL is a CSR that completely changes the instruction behaviour at the issue
phase.

As in, it actually causes MORE instructions to be issued in a hardware
for-loop.

So, setting it causes some intricate critical dependencies that will have
to be thought through really carefully.

> and
> I've found even in my simplistic approach there are still times when
> shadowing is preferable to discarding (e.g., when loading or storing
> to side-effecting I/O devices).

 Funny, eh? I just hope it does actually save gates. Or, at least, makes
for a design that has a really good bang per buck ratio.

> I think I have a good enough understanding now to attempt to build a
> very simple and crude model of the 53000B.  Basically, I'm thinking of
> implementing only enough logic to implement the following program:
>
> ADDI x1,x0,'H'
> ADDI x2,x0,HIGHBYTE_IO_SIATERM  ; SIA address for the user's serial
> terminal
> SLLI x2,x2,56
> SB   x1, SIA_TX(x2)
> ADDI x1,x0,'e'
> SB   x1, SIA_TX(x2)
> ...
> JAL  x0,*   ; deadlock
>
> and so on to print out "Hello world".  I figure if I can get four CPU
> instructions (ADDI, SB, SLLI, and JAL), this is a good indicator that
> the bulk of the work in the processor implementation is basically
> done, and the rest of the behavior is filling in the gaps.

Pretty much, yeah!  Btw LD/ST needs a bit more attention, there is a
separate Matrix for it (in addition to the FU which does the actual reg
adding to calc the address).

I am tackling that next so will have a better understanding and will be
able to explain it better, too.

>
> The SIA TX buffer is smart enough to defer acknowledgement of the bus
> until it's ready to accept data, so no need to poll the transmitter to
> see if it's ready.  Thus, the above code will work at 110 baud and
> 115200 baud without any changes.

Nice.

P.s you know I am going to borrow that when it comes to the engine I am
doing :)

> --
> Samuel A. Falvo II
>
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68