[libre-riscv-dev] microwatt decoder tables: M-Form and X-Form switched RS and RB

Tue Jun 2 13:10:37 BST 2020

ok.  interesting.  i start to see how POWER's design hangs together,
from a RISC / architectural perspective.  RA, RB, RC, RT, RS - these
would have been divided into "lanes" - effectively register file and
inter-Function-Unit "broadcast" buses: 5 of them.

RA, RB, RC and RS were probably also connected to 4 read ports (on a
4R1W regfile), and i suspect that RA and RS were multiplexed onto the
same regfile write port (with stall signals or careful scheduling).
those broadcast bus lanes would have been shared between:

* read regfile ports
* write regfile ports
* Function Unit incoming and outgoing ports.

just as for LD/ST, RS/RB would have been "laned" so that the
part-result could be "broadcast" onto those buses (without needing
MUXes to do so), such that the Shift Function Unit can pick it up as
if the information had come from the register file ports, OP_EXTS was
also put onto "lane RC" because it can then be likewise used as a
micro-code block in exactly the same way.

likewise, the OR, AND and XOR operations, to allow direct inter-FU
communication without any kind of lane cross-over.

it's really elegant, because it basically encodes "operand forwarding"
into the actual architecture.

however the _normal_ focus of RISC is on the instruction encoding.  in
the CDC 6600, there's 3 bits which are quite literally hard-wired to
enable a given Function Unit (of which there are, as might be obvious
from the 3-bits used: 8 types).  a few extra gates - again directly
from the opcode - says whether the instruction is 1-operand or
2-operands.

i believe in the CDC 6600 it's something ridiculous like under 15
gates for the entire instruction decoder.  now *that's* R(educed)
I(nstruction) S(et) C(oding)!

this therefore leaves me wondering if something similar was done for
POWER, where we've simply missed these patterns because they're just
not documented.

example: we know that ADD, NEG, SUB and even CMP can all use the exact
same underlying add hardware, with A-inversion, Out-inversion,
carry-in and carry-out detection engaged, optionally, to "augment" the
operation and turn what would otherwise be a plain add into a NEG, or
a SUB, and so on.

however... is this fact encoded *in the instructions*?  i.e. is the
instruction set encoded such that:

* "bit 8 indicates invert A"
* "bit 9 indicates invert output"
* "pattern of bits in positions X thru Y indicate an *underlying*
generic ADD hardware is to be used"

Rc and OE definitely fall into this category, however my question is:
does this pattern go much deeper?

if this principle has been applied across the board, in the ISA
design, the POWER instruction decoder phase could be drastically
simplified.  right now, the decoder is a massive 4 deep nested series
of Switch/Case statements, and consequently the number of gates
involved is enormous compared to other RISC architectures.

l.