[libre-riscv-dev] KCP53000B micro-architecture thoughts

Wed May 29 23:47:55 BST 2019

On Wed, May 29, 2019 at 11:07 PM Samuel Falvo II <sam.falvo at gmail.com> wrote:
>
> So, I had what I believe to be an epiphany today at work, and I wanted
> to get them down in a retrievable, reviewable form before I left the
> office for the day.  Luke, if you have the time to review and see if
> I'm on the right track here, it'd be very helpful.

 sure.  so, i read it all, and i believe you may have missed something
quite subtle about the augmented 6600 scoreboard.

 exceptions are handled very very simply: by hooking (and preventing)
commits, called "shadowing".  the diagram is on p55 of section
11.5.1.1 and i have a modified version here:
https://libre-riscv.org/3d_gpu/6600scoreboard/

 basically, anything that *might* need to be cancelled - whether it be
a branch, or instructions that MIGHT throw exceptions (LD/ST page
faults in particular), interrupts in general, or... anything basically
- what you do is arrange for that instruction to shadow *all future
instructions*.

 once it is *absolutely* known that the exception (or branch, or
interrupt, or anything basically) can never occur, then and only then
do you drop the shadow.

 letting an FU go ahead and calculate something that might not have
permission to write its result?  well, there's no harm in that.  if
efficiency is important, you should have predicted better if letting
the FU go ahead was a good idea :)

 so the whole concept of stalling, or of having "history" and having
to "restore" things, or inhibit things, is all kinda turned around.

 yes an instruction can be ABORTed... however since if it reached the
"write" stage and was prevented from proceeding with that, there's no
harm done.

 the normal way to do interrupts is to simply stop issuing
instructions, wait for all FUs to write their results, and once
everything is quietened down (all FUs no longer active), you begin the
exception/interrupt handler.

 real simple, and also may have unacceptable latency.

 the next more complex thing is: anything that has shadows
outstanding, tell them to fail, immediately.  this would cause all
outstanding LD/STs, all speculated branches, all
anything-with-a-shadow, to be purged IMMEDIATELY from the system.

 waiting for other units to quieten down from that point onwards, you
may not even need to wait, because they are "zero-damage"
instructions, i.e. because they are "non-exception-capable", letting
them continue *should* cause any damage to the register file (or
anything else).

 if you *absolutely* must have the interrupt done with the *absolute*
highest top priority, then all instructions should have a special "NMI
shadow", which would allow them to be cancelled IMMEDIATELY.  only
once an instruction actually reached the "commit" phase would it be
CONSIDERED whether allowing the instruction to proceed to writing
would affect the execution of the low-latency-NMI-interrupt-thing.

 arbitration for the CDB (i forget what it's called, too... Trunk i
think) which in modern terminology we call a "Register File Port" is
completely and utterly handled by the Priority Pickers.

 * a Read Priority Picker handles each Reg-Read-File-Port
 * a Write Priority Picker handles each Reg-Write-File-Port

therefore, superscalar designs simply have multiple read/write ports,
each with their own separate and completely distinct priority
pickers.... and there's no problem whatsoever.

the Priority Picker is, in nmigen terminology, basically a
back-to-back PriorityEncoder and Decoder.  unary in, unary out.
multiple bits of the unary vector can be set, however ONLY ONE OUTPUT
BIT WILL BE SET.

thus, it is ABSOLUTELY the case that the Priority Picker *WILL* only
allow one AND ONLY one FU to access the Port (Read/Write) that it
manages.

very simple.  implementation is here, and its elaborate function is
like... 12 lines long.

https://git.libre-riscv.org/?p=soc.git;a=blob;f=src/scoreboard/group_picker.py;h=133b37203a473bfe3cfea73559c589987ccfedf3;hb=d82aea2ddc957f5135227bb2439e702a961f4d4c#l49

mstatus and other CSRs, i am giving serious consideration to having a
special Dependency Matrix dedicated just to them.  i.e. to treat CSRs
*as another register file*.

CSR writing (and reading) would then have their own dedicated Function
Units, which in turn would have Dependencies *in the FU-FU Matrix*.

mstatus for example would then have its own dedicated FU, which would
create "dependencies" on instructions that were in *different* mstatus
domains.

likewise for Vectorisation, VSETVL would create new values of the
Vector Length, creating a Dependency in the FU-FU Matrix that would
allow (INT and FP) FUs to have instructions issued to them with
*different* Vector Lengths, without problems normally associated with
changes in CSRs.

i still have to think this through, because, effectively, yes: mstatus
and VL (and other similar status information) is, i believe, best
treated as "actually part of the instruction".

so yes i think the approach that you came up with, to include status
information (NMI, etc.) as being "part of the instruction" is
perfectly reasonable.

l.