[libre-riscv-dev] KCP53000B micro-architecture thoughts

Thu May 30 03:40:48 BST 2019

On Thu, May 30, 2019 at 2:07 AM Samuel Falvo II <sam.falvo at gmail.com> wrote:
>
> On Wed, May 29, 2019 at 3:48 PM Luke Kenneth Casson Leighton
> <lkcl at lkcl.net> wrote:
> >  sure.  so, i read it all, and i believe you may have missed something
> > quite subtle about the augmented 6600 scoreboard.
>
> To be fair, I wasn't considering the 6600 design.  I was designing
> from first principles, as that's the only way I can keep things
> straight in my head.

 good idea.

> Regrettably, I still cannot figure out Mitch's
> or Thornton/Cray's designs.

 took me about 5-6 months of study, 2 of which were near-full-time
communication with mitch.

> I've pretty much given up trying at this
> point, content in the knowledge that they're just too complex for my
> mind to grasp.

 from what you've written you demonstrated a pretty clear understanding.

> >  exceptions are handled very very simply: by hooking (and preventing)
> > commits, called "shadowing".  the diagram is on p55 of section
> > 11.5.1.1 and i have a modified version here:
> > https://libre-riscv.org/3d_gpu/6600scoreboard/
>
> This is the job of ABORT in my design.  Most instructions execute
> under the speculative assumption that there'll be no exceptions
> generated (which is the normal use-case) by any previous instructions
> issued.

 unless they are also prevented from writing results, this is a
dangerous design.  if however they are "shadowed" (prevented from
committing until the "danger" - the exception - is passed) by the
instruction that *could* raise the exception - then that is a safe
(correct) design.

> Only when an exceptional condition is detected will ABORT
> trigger, causing all other FUs to reset to their quiet states.
>
> >  so the whole concept of stalling, or of having "history" and having
> > to "restore" things, or inhibit things, is all kinda turned around.
>
> My view when thinking about this architecture was to just discard
> everything in the FUs,

 that's exactly the same thing, said in alternative words.

 discard == abort == cancel == Go_Die.

 all the exact same thing.

> since the exception handling code will result
> in it all being re-run again anyway.
>
> >  the normal way to do interrupts is to simply stop issuing
> > instructions, wait for all FUs to write their results, and once
> > everything is quietened down (all FUs no longer active), you begin the
> > exception/interrupt handler.
>
> Interrupts still require transitioning the machine state to an
> elevated privilege level though.

 that's just a change of some flag that changes the behaviour of
instructions.  as long as that flag is carried along with the
instructions, to the FUs that need to respect that flag, what is the
problem?

>  I figured having a dedicated
> interrupt unit would be over-kill; but it is at least an interesting
> exercise to be able to represent interrupts as something that can be
> treated in a FU in this architecture.

 remember that it is absolutely critical to conceptually and mentally
separate the analysis of the *dependencies* between data and the data
itself.

 in this case, "data" equals "flags" equals "Control Status Registers"
equals "part of the instruction" equals "mstatus" equals "interrupt".

> The CDC had the advantage that the I/O processors dealt with system
> calls and the like (IIRC, the CDC 6600 operating system did not run on
> the main processor, but on I/O processor 0), which took a lot of the
> state transition and protection burdens off the main processor.
>
> > the Priority Picker is, in nmigen terminology, basically a
> > back-to-back PriorityEncoder and Decoder.  unary in, unary out.
> > multiple bits of the unary vector can be set, however ONLY ONE OUTPUT
> > BIT WILL BE SET.
>
> Yes; it's the same basic logic that drives any other bus arbiter.  I
> was also planning on exploring arbiters with guaranteed fairness as
> well, to see how that affected performance.  (Just a curiosity of
> mine.)

 i wondered about that, too, then i realised that if the FUs are
identical (and are passed the operation as well as the operands) it
really doesn't matter.

 however, yes, i did wonder what the effect would be if the FUs were
non-uniform.

> > mstatus and other CSRs, i am giving serious consideration to having a
> > special Dependency Matrix dedicated just to them.  i.e. to treat CSRs
> > *as another register file*.
>
> I'm planning on representing the entire CSR space as a single
> dependency.

 that sounds like a good strategy.  one FU would look after it.  just
bear in mind that if there is only one FU to deal with any type of
instruction, then that FU becomes a stall-point.

 so if there are two CSR writes in quick succession, because an FU
must only deal with one instruction at a time, the entire processor
must "stall" until the first CSR write is committed.  then and *only*
then may the instruction issue proceed to put the next CSR write into
the CSR-FU.

 hypothetically, though, several CSR writes could be grouped together
(into a single instruction)  however... mmm....  i suspect that going
down that route would hugely complicate the design, particularly the
transfer of instruction data.

> Since an instruction can touch only one CSR at a time,
> this seems reasonable.  Other operations on CSRs (e.g., as when traps
> cause mstatus, mepc, mcause, et. al. all to be updated at once) seem
> to happen outside the normal instruction execution flow (e.g., either
> as an exceptional condition in an instruction, or in between
> instructions as an interrupt), and so I was going to represent those
> as 1-hot event triggers, which would cause bulk state transitions in
> the subsequent clock cycle.  Any time a bulk update like that happens,
> it is always accompanied by a privilege escalation, so I just ABORT
> all pending FUs (hence why ABORT is just the logical OR of all the
> event triggers from all the FUs).

 unfortunately, i've not thought through the CSRs enough to be able to
comment.  off the top of my head, though, if a CSR may be affected by
any given instruction, then to my mind it makes sense to simply treat
it as... just another destination "register" as part of the Dependency
Matrices (this time it would be the "CSR Register Dep Matrix", being
the 3rd Reg-Matrix, next to FP Reg Matrix and INT Reg Matrix)

then, if the instruction doesn't actually need to write to the CSR, it
simply drops the Write-Dependency on that CSR.

in this hypothetical design, whilst CSRRW (and friends) would be by
far and above the biggest "creator" of CSR Write Dependencies, some
instructions such as ECALL and MRET would be less obvious "writers".

i still have to think the idea through properly.

l.