[libre-riscv-dev] KCP53000B micro-architecture thoughts

Tue Jun 4 04:32:25 BST 2019

hiya sam, very weird, i didn't receive this message, despite it going
through to the archives
http://lists.libre-riscv.org/pipermail/libre-riscv-dev/2019-June/001610.html

> Although I'm not exploiting it, I should note that this design seems
> to be pipeline-ready.  Instead of a simple 1-hot ring counter, a
> multi-hot ring counter (or the logical OR of several ring counters,
> such as what you'd have if you allocated one FU per pipeline stage,
> which I believe you and Mitch Alsup discussed earlier, and is
> documented in Alsup's chapters) can be used to parastaltically drive
> data through the CU.

yesterday i just moved the "fake delay" code from compalu.py into the
ALU code, added ready/valid signalling around both, and then added the
"fake delay" into the ALU code, to simulate a pipeline there.

it worked extremely well.

so i "Get" the "parasitic driving" thing.  actually what has happened
is that in places i am now *ignoring* the SR Latches entirely, and
driving directly from the Go_Read and Go_Write signals.  this i am not
so sure about, if the ALU pipeline (a real pipelne) is not ready, it
will *fail*.  however as long as the pipelines are non-stalling it
will be fine.

now what i can do is move forward with the "concurrent" pipeline / CU
thing.  this is basically as described in Mitch's 2nd chapter, you put
*MULTIPLE* CUs funneling onto the SAME pipeline.  obviously in a
multi-issue scenario you do not - must not - throw more than one
instruction at the same CUs, with the "partly-broken" signalling i
described above, because the pipeline will *NOT* accept multiple
inputs from multiple CUs, it will ONLY accept ONE input.

once i have that properly fixed (somehow), it should work fine, and
the IEEE754 multi-fan-in multi-fan-out code i did a few months ago
should just drop in place with little to no modifications.

btw now that you're thinking in terms of pipelines behind CUs, Mitch's
"Concurrent Computation Units" should start to make sense.  note that
there is the quirk of "timing chains" which are basically a
multiplexer ID in the form of a chain of SR latches.

the "timing chains" basically map directly to the number of pipeline
stages, and assume that there is no "early-out" capability, fixed
length pipelines.  by the time the "ripple" meets the result selector
MUXes to direct the result back to the correct CU, the pipeline has
the correct result ready.

if you are planning early-out pipelines or variable-length ALUs, you
MUST NOT do this trick, you MUST instead pass the result selector MUX
identifier *WITH THE OPERANDs*, either down the pipeline or just
stored in whatever variable-completion-time FSM happens to be behind
the (multiple CUs).

once the result is ready that MUXid, which is basically the index back
to the Computation Unit that passed the operands to us, is used to get
the result *back* to that same CU.

then that same CU can latch the result, raise its "REQ_RELEASE", and
wait for the GO_WRITE.

this is why i followed Mitch's diagrams because the signalling /
handshaking from CU to ALU really is totally separate from the
handshaking between CU and FU (and Regfile).

l.