[Libre-soc-dev] compressed instructions state requirements

Wed Nov 25 14:06:23 GMT 2020

On Tue, Nov 24, 2020 at 9:07 PM Jacob Lifshay <programmerjake at gmail.com> wrote:

> My idea of how 16-bit instructions work is that they should be usable
> anywhere (like RVC), no special pages needed. The extra info is
> conceptually part of the PC (or some decode status register).

right.  "some decode status register".  let's walk through it.  let me
know (explicitly) if you agree or disagree with the answers.

Q: how often will that status register need to be set/reset?
A: on every call into a standard PPC64LE ABI v3.0B formatted function
Q: how do you return *to* the enhanced mode?
A: on *entry* to every function encoded in enhanced-variable-encoding
mode, instructions are required to set the new mode
Q: what is the cost of setting a status register?
A: if it requires a single bit to be set it is at least 5 possibly 6,
7, 8 instructions (if in the high bits):
    mfspr, rldicr (to get the top bits down to "reachable" by oris),
oris, rldicr (to get it back up), and finally mtspr
Q: how many "batches" of those 5, 6, 7, 8 instructions are required:
A: one batch on EVERY single call to a PPC64LE ABI v3.0B function
    one batch at the start of every single enhanced-variable-encoding
mode function
    one batch at the exit of every single enhanced-variable-encoding
mode function

this latter so that the enhanced-encoded function "looks like" a
standard PPC64LE ABI v3.0B function at all times.

> Unlike VLE,
> it works fine with any combination of 32/64-bit mode and LE/BE mode,

... but, we established yesterday, not with 16/48.

> in a
> way such that the bytes in memory needed to encode pre-existing
> instructions are completely unchanged -- you can run pre-existing
> PowerPC[64][LE] programs entirely unmodified even if the processor supports
> 16/48-bit instructions since the program always starts executing in
> Standard Mode.

the caveat being: if there is no VLE mode-page the cost is such a high
quantity of mode-setting instructions that it jeapordises the entire
purpose of the exercise.

even if we add just the one instruction that allows a compact
mode-switching (to alter the "decode status register" with only a
32-bit instruction), that's still one instruction too many given that
it's literally going to be at the start and end of absolutely every
single function call, and added just before literally every single
call to a ppc64le v3.0B ABI function.

... or...

... we could very simply mark pages with ONE (quantity 1) single bit.

> > if however a mix is permitted within a "marked" 64k page (16/32/48/64)
> > then the
> >
>
> Truncated sentence?

doh.  yes.  i believe i completed what i wanted to point out, above.

> We need to contact the OpenPower Foundation and get permission to implement
> v3.1.

before doing that i'd reaaally like to know that it's worthwhile.
below you outline that it may well be the case.  however applying
SVPrefix to v3.1B 64-bit instructions: this is well... it'll be...
exciting.

> I disagree: having code that's compatible with v3.1 means getting a speed
> bump from better support for larger immediates (34-bits instead of 16) as
> well as PC-relative addressing. This could mostly eliminate the need for a
> TOC, since shared libraries can generally be assumed to be less than 8GB in
> size. This should also reduce code size somewhat. Though that's all true
> only once compilers catch up.

indeed.  i'd really prefer to see numbers on code-size reduction that
results.  from what you're saying they could be quite significant: the
amount of extra work involved is something that should not taken
lightly.

> The way I'm envisioning it, SVP64 instructions share the PowerISA v3.1
> prefix encoding space with PowerISA v3.1 64-bit instructions (more than
> half that space is available),

ah no.  absolutely not.  no way.  the entirety of the SV-P64 needs to
be completely and 100% free and clear.  and it also needs to be not
one but two EXTNNNs.  this was established 18+ months ago from the
work done on the original SV-P64 (RV) encoding.

i really, *really* do not want to have yet more time spent doing yet
another total redesign of the SVP formats.  we simply do not have
time.

when i said that we need to accelerate the development, i really meant it.

we *have not* got time to try to "desperately work out how to cram in
mix of two completely different encodings".  plus, there is no
guarantee that IBM will not extend EXT001 in the future, jeapordising
the entirety of SV-P64.

logically, therefore (and particularly given that they're
mutually-exclusively incompatible)

> SVP48 instructions use the same 48-bit
> encoding space as all other 48-bit instructions (probably using primary
> opcode 0)

two primary opcodes.  11 bits are required for the small prefix (and i
reiterate: i *do not* want us wasting yet more time to redesign
something that's already had months of work gone into it)

> and SVP32 instructions use other 32-bit encoding space (possibly
> shared using primary opcode 0).

not a chance in hell of it being a single primary opcode, or shared.
two separate primary opcodes are required, those two being completely
separate and distinct from all other prefix-identifying opcodes.

please read this page
https://libre-soc.org/openpower/sv/major_opcode_allocation/
and, reminder: the original page:
https://libre-soc.org/simple_v_extension/sv_prefix_proposal/

it's 11 bits for the short version (applied to SV-P48) and 27 bits for
the long version (applied to SV-P64).

i emphasises again: i really, *really* do not want the time wasted
doing yet another redesign of something that took several months to
write.

we have not got time.

> Yup, that can be done by (in Standard Mode) decoding the primary opcode as
> well as (for opcode 0) one bit of the extended opcode field (the 256 place)
> for compatibility with the "Service Processor Attention" instruction, which
> needs to be 32-bit. That should be sufficiently trivial to satisfy your
> worries about decode issues with multi-issue.

the only reason i can think of where this would be reasonable is if
there was a genuine legitimate reason to halt the processor during the
first-stage (length/mode-identifying) phase.  given that that would
require considerable additional gates (identifying the *full* 32-bit
pattern 0x00000080) i would also be very reluctant to suggest even
doing that.

given that it is such a rare occurence ("halt processor") the benefits
of conforming to standard conventions i'd advocate that they outweigh
the "cost" of moving one (single, extremely rare) instruction.

> This causes no issues with needing all 0s to be illegal, since, in Standard
> Mode, the first 32-bits in memory being all 0s would be an illegal 48-bit
> instruction and in Compressed Mode all 0s would be an illegal 16-bit
> instruction. No need to use Primary Opcode 0 for 16-bit instructions in
> order to achieve that, so I think 16-bit instructions in Standard Mode
> should use Primary Opcode 5 since that is entirely unallocated,

you're missing the fact that *two* primary opcodes are required for C.
i repeat-documented this 2 weeks ago, based on an analysis that i did
almost a year ago when we first began the move of SV to OpenPOWER

nggggh :)

two contiguous primary opcodes means that in the critical stage of
identifying the length/mode, extra gates are not required to recognise
two separate non-contiguous patterns then AND those results together.

instead by having two contiguous EXTNNNs you can *drop* one bit from
the detection logic.

> no need for
> annoying workarounds to get "Service Processor Attention" to still be
> 32-bit since the Extended Opcode field which encodes "Service Processor
> Attention" is outside of 16-bits and all the other bits are don't-cares for
> "Service Processor Attention", meaning using PO 0 would require always
> reading 32-bits to check -- very messy.

i may be missing something: is "attn" an extremely common instruction?
 (what is PO 0?)

also, i'm having difficulty parsing the above paragraph.

> >
> > > and 48-bit (no spec yet) instructions.
> >
> > TBD when we get to SV Prefixing.  remember also that we have SV-P64
> > (32-bit SV prefix plus a 32-bit instruction) and we have SV-C64
> > (32-bit prefix plus a 16-bit swizzle prefix plus a 16-bit Compressed)
> >
>
> I propose that we limit the maximum possible instruction length to 64-bits
> (kinda like x86's 15-byte limit)

errr yeah.  i kinda instinctively rebelled against going beyond 64 bits.

> allowing the encoding I described above to
> be sufficient.

no, unfortunately.  if you're referring to a fundamental assumption
that a single major opcode is sufficient for SV-Prefix encodings, that
is.  right back as far as the very first days where we discussed
potentially moving to OpenPOWER the very first thing that i did was:
analyse SV Prefix encodings.

i found that the SV Prefix encodings *only* worked if they were
allocated 2x EXTNNN opcodes each.

this has therefore been the *fundamental* assumption of the entire
development and discussion of SV-OpenPOWER right from the very start.

>In particular, this means 64-bit instructions can't be
> further prefixed.

whewww :)  yeah 72 and 96 bit... yeah.

honestly my feeling is that given that we are 99.99% likely to have to
use a VLE-style page-bit marker, we're kinda "free and clear" to move
Major opcodes around.  i'd therefore advocate that it should be *v3.1B
P64* that's moved to EXT005, leaving C free and clear to need only 5
bits to identify at the gate-critical level of identifying
mode/length.

l.