[libre-riscv-dev] [OpenPOWER-HDL-Cores] microwatt decoder tables: M-Form and X-Form switched RS and RB
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Jun 3 02:28:59 BST 2020
On Wed, Jun 3, 2020 at 1:41 AM Paul Mackerras <paulus at ozlabs.org> wrote:
> On Tue, Jun 02, 2020 at 01:10:37PM +0100, Luke Kenneth Casson Leighton wrote:
> > this therefore leaves me wondering if something similar was done for
> > POWER, where we've simply missed these patterns because they're just
> > not documented.
> As far as I can see, there are patterns like this to some extent, but
> not uniformly, and not as much as one would like.
> For example, with the condition register logical ops (crand etc.),
> there are 4 bits of the instruction word that give the truth table for
> the operation.
yes i saw that trick: utterly cool :) actually using the instruction
field as a 4-bit lookup.
> However, the fixed-point logical ops don't have that.
possibly because all permutations (particularly if you include immediates)
would simply have taken up too much of the major opcode space.
> > however... is this fact encoded *in the instructions*? i.e. is the
> > instruction set encoded such that:
> Appendix D (Opcode Maps) of the ISA is useful for investigating this
> sort of question.
ah good point.
> The add and sub instructions are on page 1365, with
> sub instructions in the first column (bits 1-5 of the instruction word
> being 01000) and add instructions in the third column (01010). The
> ops that use CA for the carry input seem to be in rows 5-8 and 21-24
> (instruction bits 6-10 containing x01xx). The ops that have the carry
> input set to one all have bit 6 of the instruction word = 1.
> > if this principle has been applied across the board, in the ISA
> > design, the POWER instruction decoder phase could be drastically
> > simplified. right now, the decoder is a massive 4 deep nested series
> > of Switch/Case statements, and consequently the number of gates
> > involved is enormous compared to other RISC architectures.
> I don't know of such a principle being applied in recent times, though
> there may have been such a principle in the very early days (early
RISC-V was based on the principle of "learning from past mistakes",
and a thorough analysis shows that one hell of a lot of thought has
gone into the ISA design. i can say that it does indeed make for
a much simpler decoder. the 16-bit C encoding for example, on first
sight, looks like a mess (bits all over the place), however when
implemented in hardware, several bits fall straight through without
needing to be MUXed, regardless of which Function Unit actually
needs those registers / sub-operations / etc.
unfortunately they did not leave enough room for future extension.
LD-with-update is not possible to add, without some inconvenience,
> I had hoped that the logic simplification algorithms that the
> synthesis tools have would automatically detect the patterns, but
> maybe that's asking a bit much.
if you're paying USD $90m (or renting the tools @ USD 250,000 a week)
it's perfectly reasonable! the rest of us - using yosys? mmmm... :)
i've found that yosys can do a reasonable job of reducing gate delay.
(reducing longest paths). however that *increases* gate count, not
i did wonder if there is a special mode for yosys "synth" command
to ask it to optimise for reduced gate count, instead.
> It would probably help if our decode
> tables indicated the don't-care entries rather than just putting 0 for
> table entries that have no effect.
when converting to CSV format, i went over all of them: i didn't find
More information about the libre-riscv-dev