[libre-riscv-dev] [OpenPOWER-HDL-Cores] microwatt decoder tables: M-Form and X-Form switched RS and RB

Wed Jun 3 01:40:52 BST 2020

On Tue, Jun 02, 2020 at 01:10:37PM +0100, Luke Kenneth Casson Leighton wrote:
> ok.  interesting.  i start to see how POWER's design hangs together,
> from a RISC / architectural perspective.  RA, RB, RC, RT, RS - these
> would have been divided into "lanes" - effectively register file and
> inter-Function-Unit "broadcast" buses: 5 of them.
> 
> RA, RB, RC and RS were probably also connected to 4 read ports (on a
> 4R1W regfile), and i suspect that RA and RS were multiplexed onto the
> same regfile write port (with stall signals or careful scheduling).
> those broadcast bus lanes would have been shared between:
> 
> * read regfile ports
> * write regfile ports
> * Function Unit incoming and outgoing ports.
> 
> just as for LD/ST, RS/RB would have been "laned" so that the
> part-result could be "broadcast" onto those buses (without needing
> MUXes to do so), such that the Shift Function Unit can pick it up as
> if the information had come from the register file ports, OP_EXTS was
> also put onto "lane RC" because it can then be likewise used as a
> micro-code block in exactly the same way.
> 
> likewise, the OR, AND and XOR operations, to allow direct inter-FU
> communication without any kind of lane cross-over.
> 
> it's really elegant, because it basically encodes "operand forwarding"
> into the actual architecture.
> 
> 
> however the _normal_ focus of RISC is on the instruction encoding.  in
> the CDC 6600, there's 3 bits which are quite literally hard-wired to
> enable a given Function Unit (of which there are, as might be obvious
> from the 3-bits used: 8 types).  a few extra gates - again directly
> from the opcode - says whether the instruction is 1-operand or
> 2-operands.
> 
> i believe in the CDC 6600 it's something ridiculous like under 15
> gates for the entire instruction decoder.  now *that's* R(educed)
> I(nstruction) S(et) C(oding)!
> 
> this therefore leaves me wondering if something similar was done for
> POWER, where we've simply missed these patterns because they're just
> not documented.

As far as I can see, there are patterns like this to some extent, but
not uniformly, and not as much as one would like.

For example, with the condition register logical ops (crand etc.),
there are 4 bits of the instruction word that give the truth table for
the operation.  However, the fixed-point logical ops don't have that.

> example: we know that ADD, NEG, SUB and even CMP can all use the exact
> same underlying add hardware, with A-inversion, Out-inversion,
> carry-in and carry-out detection engaged, optionally, to "augment" the
> operation and turn what would otherwise be a plain add into a NEG, or
> a SUB, and so on.
> 
> however... is this fact encoded *in the instructions*?  i.e. is the
> instruction set encoded such that:
> 
> * "bit 8 indicates invert A"
> * "bit 9 indicates invert output"
> * "pattern of bits in positions X thru Y indicate an *underlying*
> generic ADD hardware is to be used"

Appendix D (Opcode Maps) of the ISA is useful for investigating this
sort of question.  The add and sub instructions are on page 1365, with
sub instructions in the first column (bits 1-5 of the instruction word
being 01000) and add instructions in the third column (01010).  The
ops that use CA for the carry input seem to be in rows 5-8 and 21-24
(instruction bits 6-10 containing x01xx).  The ops that have the carry
input set to one all have bit 6 of the instruction word = 1.

I can't see much more structure to the layout than that, though others
might.

> Rc and OE definitely fall into this category, however my question is:
> does this pattern go much deeper?
> 
> if this principle has been applied across the board, in the ISA
> design, the POWER instruction decoder phase could be drastically
> simplified.  right now, the decoder is a massive 4 deep nested series
> of Switch/Case statements, and consequently the number of gates
> involved is enormous compared to other RISC architectures.

I don't know of such a principle being applied in recent times, though
there may have been such a principle in the very early days (early
90s).  I had hoped that the logic simplification algorithms that the
synthesis tools have would automatically detect the patterns, but
maybe that's asking a bit much.  It would probably help if our decode
tables indicated the don't-care entries rather than just putting 0 for
table entries that have no effect.

Paul.