[libre-riscv-dev] KCP53000B micro-architecture thoughts

Fri May 31 23:34:20 BST 2019

On Fri, May 31, 2019 at 5:38 AM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>  so the insight that's confusing - and worth emphasising - is that by
> using a FunctionUnit to perform the "alteration" of the flags (even if
> those flags are major, serious "state" from CSRs), the Dependency
> Matrix that "tracks" that FunctionUnit looks after the
> setting/alteration
>
>  such that, once set/altered, *copies* (read-only copies) of that
> state (CSR) may safely be made in, for example, latches, that are used
> in various other places.

Yep, I came to that conclusion too.  Additionally, especially when
day-dreaming about the 6502 refit, I also came to the conclusion that
nothing forbade the use of a dependency matrix (or its equivalent) to
protect individual sub-fields within a single register either.  E.g.,
on the 6502, nearly all instructions have a write dependency on the
flags register, but very few have read dependencies.  Those that DO,
however, often only care about specific flags.  For example, ADC and
SBC care only about the carry flag, not zero, overflow, or negative.
So, it makes sense to me to reflect that in the design of the
dependency matrix where it makes sense to.

>  so for example if the LD/ST Unit has some sort of "global state"
> which is affected by a CSR (the virtual memory section, involving
> ASIDs for example), which it could well have if it has been designed
> in a particular way, and you fail to take that into account in the L2
> cache or something, then yes, things will go horribly wrong.

Perhaps this is an application of shadowing.  I've been putting more
thought into the use of shadow flags to impede instruction issue, and
I've found even in my simplistic approach there are still times when
shadowing is preferable to discarding (e.g., when loading or storing
to side-effecting I/O devices).

I think I have a good enough understanding now to attempt to build a
very simple and crude model of the 53000B.  Basically, I'm thinking of
implementing only enough logic to implement the following program:

ADDI x1,x0,'H'
ADDI x2,x0,HIGHBYTE_IO_SIATERM  ; SIA address for the user's serial terminal
SLLI x2,x2,56
SB   x1, SIA_TX(x2)
ADDI x1,x0,'e'
SB   x1, SIA_TX(x2)
...
JAL  x0,*   ; deadlock

and so on to print out "Hello world".  I figure if I can get four CPU
instructions (ADDI, SB, SLLI, and JAL), this is a good indicator that
the bulk of the work in the processor implementation is basically
done, and the rest of the behavior is filling in the gaps.

The SIA TX buffer is smart enough to defer acknowledgement of the bus
until it's ready to accept data, so no need to poll the transmitter to
see if it's ready.  Thus, the above code will work at 110 baud and
115200 baud without any changes.

-- 
Samuel A. Falvo II