[Libre-soc-bugs] [Bug 238] POWER Compressed Formal Standard writeup

Sun Nov 22 23:00:04 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=238

--- Comment #61 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(again, from aoliva, copying a message that was a direct reply to the bugzilla
daemon)

On Nov 22, 2020, bugzilla-daemon at libre-soc.org wrote:

>> I can't help the feeling that we're wasting precious encoding bits to remain
>> in 16-bit mode.  We could easily double the number of accessible registers
>> using them, and having an encoding similar to that of nop to switch out.

> this was discussed in comment #8

It was, but from the perspective of "let's try to get some of the most
common opcodes duplicated in this compressed form", which reminds me of
Thumb v1.  What I'm thinking is of a slightly different perspective, of
making the compressed form so extensive that you'd hardly ever have a
need to switch back to 32-bit mode, which is more like later versions of
Thumb.

Bear in mind that Thumb nowadays is a very complete instruction set.
Why not aim for that?

>> I'm also thinking that, if we use 6 bits just to enter 16-bit mode, and the
>> remaining 10 bits, though theoretically useful, will most often go to waste,
>> and we were willing to spend 2 bits to remain in it, we might as well use
>> one-byte nops (suitably encoded) to switch between modes, and have 16-bit
>> instructions using the whole 16 bits.  We could then have 16-bit
>> instructions starting at odd addresses, and 32-bit ones at even addresses,
>> sort of like ARM/Thumb, SH5media/SHcompact, etc, but for real.

> this would require complex scheduling and sizing of instructions to get the
> perfect alignment.

Err, I think we're miscommunicating.

There's no need for any special alignment.  The odd- and even- addresses
trivially follow from using a 1-byte switch insn.

> if we knew in advance that instruction streams were to remain in 16bit mode
> consistently for considerable durations i would be much more in favour of
> schemes that had to rely on nops to transition between modes.

That's what I'm aiming for with my suggestions.

>> We could have an even more extensive opcode selection by giving up
>> 3-register encodings,

> maybe.  bear in mind the idea is to encode the most common instructions.  this
> is where we really need that statistical analysis.

>> and follow the practice of other compact encodings of
>> using a single register as input and output operand.

> this was discussed in comment #18

Yeah, but dismissed without data.

Consider that 22% of the instructions that take 3 registers as operands
in the ppc-gcc binary I'm using for tests actually use the output
register as one of the input operands, without any effort by the
compiler to make them so:

$ ./objdump -d ppc-gcc | grep -c ' r[0-9]*,r[0-9]*,r[0-9]*$' # x <- y op z
5731
$ ./objdump -d ppc-gcc | grep -c ' \(r[0-9]*\),\1,r[0-9]*$'  # x <- x op z
673
$ ./objdump -d ppc-gcc | grep -c ' \(r[0-9]*\),r[0-9]*,\1$'  # x <- y op x
630

We (well, I :-) can easily reconfigure the compiler to prefer such a
form, without ruling out 3-different-register alternatives, and see how
far that gets us, but my intuition is that, with such a change, and
access to the full register file (as enabled by using 2 rather than 3
operands), we could then get very long sequences compressed of
instructions.

>> Most (*) 3-operands
>> insns x := y op z can be turned into x := y ; x op= z, which is no worse and
>> probably better than switching to 32-bit mode,

> except: look at the amount of space used to do so.  it's still 32 bit, isn't
> it?

Point was, even in the (to be exceptional) case of NOT getting an insn
that could be encoded as a single compressed insn from the compiler, you
could still remain in compressed mode.

And by making the switching out of compressed mode rare enough, we'd
strengthen the case for using the 2/16 bits for something more useful
than switching modes: the switch out would be an exception, thus a
special opcode rather than 1/8th of every insn.

And then, with the 1-byte mode switching, if we wanted to switch for 1
or n 32-bit insns, that costs us the space of one 16-bit insn: 8 bits
before the first 32-bit insn, and another 8 bits after the last one to
switch back.

> bear in mind: i recognise that on the transition from v3.0B to 16bit mode it's
> impossible to fit 3 operands, hence all 10bit encodings are 1 src 1 dest.

To me, the 10-bit encodings come across as complexity I'd much rather
get rid of.

>> and we could have an
>> extend-next pseudo-insn to supply an extra operand and/or extra
>> immediate/offset bits to the subsequent insn.

> i very much do not want to go down this particular route, although it is very
> tempting to do so.

Think of it as a 32-bit insn representation if you must ;-)

Thumb does that.  We're looking at 48-bit instructions that are hardly
different from that.

> if the complexity is high then the benefit had better be waay higher.

I don't really feel that an extend-next opcode that supplies an extra
operand and/or an extended immediate range for the subsequent insn that
would otherwise take it, respectively, from a repeat operand, or from a
sign-extension of a narrow immediate range is such a huge increase in
complexity, and I believe the benefits could be huge in enabling us to
remain in compressed mode for much longer.

> ahh... ahh... this is novel.  effectively it's using the bottom 2 bits
> (actually, bit 1) to indicate the 16/32 bit mode.

*nod*, just like Thumb vs ARM and SHcompact vs SHmedia.

Except they did not actually use misaligned insns crossing word
boundaries.  Presumably that would be a problem for them, but for us it
looks like it isn't, so we might as well take advantage of it ;-)

-- 
You are receiving this mail because:
You are on the CC list for the bug.