[libre-riscv-dev] [OpenPOWER-HDL-Cores] microwatt decoder tables: M-Form and X-Form switched RS and RB

Wed Jun 3 02:28:59 BST 2020

On Wed, Jun 3, 2020 at 1:41 AM Paul Mackerras <paulus at ozlabs.org> wrote:
> On Tue, Jun 02, 2020 at 01:10:37PM +0100, Luke Kenneth Casson Leighton wrote:
> > this therefore leaves me wondering if something similar was done for
> > POWER, where we've simply missed these patterns because they're just
> > not documented.
>
> As far as I can see, there are patterns like this to some extent, but
> not uniformly, and not as much as one would like.
>
> For example, with the condition register logical ops (crand etc.),
> there are 4 bits of the instruction word that give the truth table for
> the operation.

yes i saw that trick: utterly cool :)  actually using the instruction
field as a 4-bit lookup.

>  However, the fixed-point logical ops don't have that.

possibly because all permutations (particularly if you include immediates)
would simply have taken up too much of the major opcode space.

> > however... is this fact encoded *in the instructions*?  i.e. is the
> > instruction set encoded such that:
>
> Appendix D (Opcode Maps) of the ISA is useful for investigating this
> sort of question.

ah good point.

> The add and sub instructions are on page 1365, with
> sub instructions in the first column (bits 1-5 of the instruction word
> being 01000) and add instructions in the third column (01010).  The
> ops that use CA for the carry input seem to be in rows 5-8 and 21-24
> (instruction bits 6-10 containing x01xx).  The ops that have the carry
> input set to one all have bit 6 of the instruction word = 1.

interesting.

> > if this principle has been applied across the board, in the ISA
> > design, the POWER instruction decoder phase could be drastically
> > simplified.  right now, the decoder is a massive 4 deep nested series
> > of Switch/Case statements, and consequently the number of gates
> > involved is enormous compared to other RISC architectures.
>
> I don't know of such a principle being applied in recent times, though
> there may have been such a principle in the very early days (early
> 90s).

RISC-V was based on the principle of "learning from past mistakes",
and a thorough analysis shows that one hell of a lot of thought has
gone into the ISA design.  i can say that it does indeed make for
a much simpler decoder.  the 16-bit C encoding for example, on first
sight, looks like a mess (bits all over the place), however when
implemented in hardware, several bits fall straight through without
needing to be MUXed, regardless of which Function Unit actually
needs those registers / sub-operations / etc.

unfortunately they did not leave enough room for future extension.
LD-with-update is not possible to add, without some inconvenience,
for example.

>  I had hoped that the logic simplification algorithms that the
> synthesis tools have would automatically detect the patterns, but
> maybe that's asking a bit much.

if you're paying USD $90m (or renting the tools @ USD 250,000 a week)
it's perfectly reasonable!  the rest of us - using yosys?  mmmm... :)

i've found that yosys can do a reasonable job of reducing gate delay.
(reducing longest paths).  however that *increases* gate count, not
reduces it.

i did wonder if there is a special mode for yosys "synth" command
to ask it to optimise for reduced gate count, instead.

>  It would probably help if our decode
> tables indicated the don't-care entries rather than just putting 0 for
> table entries that have no effect.

when converting to CSV format, i went over all of them: i didn't find
many opportunities.

l.