[Libre-soc-bugs] [Bug 238] POWER Compressed Formal Standard writeup

Sat Nov 21 10:33:30 GMT 2020

https://bugs.libre-soc.org/show_bug.cgi?id=238

--- Comment #47 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
discussion moved to bugtracker
http://lists.libre-soc.org/pipermail/libre-soc-dev/2020-November/001229.html

On 11/21/20, Alexandre Oliva <oliva at gnu.org> wrote:
> On Nov 20, 2020, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>
>> jumping straight into practical matters: we're in the middle of
>> getting SimpleV redone on top of OpenPOWER, and a first step there is
>> Compressed Instructions.
>
> So, if I understand what I read in bug 238, we are looking into
> introducing an extension to PowerISA for code compactness, a little like
> Thumb vs ARM,

more accurately, like RISCV RVC where there is only one instruction that does
not map directly to 32bit (the mv instruction)

> with 16-bit instructions rather than 32-bit ones, so that
> fragments of code that fit certain constraints, such as using only a
> subset of the register file, sufficiently-narrow immediate operands and
> offsets, and a limited set of operations, can be represented in such
> shorter instructions, so as to save instruction cache, memory, bus
> traffic, etc.

correct.  also this stops massive ISA opcode proliferation.  SIMD as a concept
is an O(N^6) opcode proliferation (why do you think x86 now has.. what... 128
bit instruction length, now? 256?)

> I get the idea that the transition between 32-bit instructions and
> 16-bit instructions is to be dynamic, rather than based on the
> instruction encoding.  This raises various concerns to me, from a
> toolchain engineer perspective.

excellent.

> One is how to mark fragments of code so that the tooling can tell
> whether to emit or decode 16-bit or 32-bit instructions.  Say, how is
> the disassembler supposed to tell whether it's looking at a 32-bit
> instruction or a pair of 16-bit instructions?

by looking at the context starting from the function entrypoint, which conforms
to standard ABIs and therefore starts only and exclusively from standard v3.0B
glibc ABIs.

if that does not exist it will, just like LE/BE, be passed in via parameters.

> Just to give an example of what I'd like to avoid, the SH port has
> floating-point instructions that become single- or double-precision ones
> depending on a bit dynamically set in a control register.

sounds fantastic.  saves vast ISA proliferation.  delighted to hear there is
precedent.

do you have a link to the SH port? i have no idea what it is.

>  Just by
> looking at the instruction, there's no way to tell whether it's single-
> or double-precision.

yep.  given that we are retrofitting an existing ISA with "context" in a
massive way (bits that are part of the instruction decode but are in a Status
Control Register) tooling... well.. has to suck it up and deal with it.

the enntiiire concept we are doing is predicated on "hidden state" (context).
Vector Length. predication. element-width overrides.  bank switching. 
absolutely everything.

that said: given that this is not going to cross, interfere with or alter ABIs,
every function is *required*, clearly, to start from the known standard v3.0B
glibc ABI, and every function call must reset back to that known standard ABI,
at exit.

therefore every function is *required* to have the exact context you are
concerned might be missing.

now, if this SH port did not do that, then of course they are going to run into
trouble.

> Are sections to be marked as 32- or 16-bit code, so that transitions
> between modes has to also jump from one section to the other?

the bitmarking and context is intended and designed to avoid precisely that.

the reason is that jumps between sections poisons the entire purpose of having
a reduced code size.  one C instruction followed by 6 instructions to LD a 64
bit immediate followed by a 32 bit jump cannot in any way be considered a
reduction, can it?

> Are labels to be marked, so that e.g. odd addresses are encoded in the
> more compact mode? 

if that's what it takes for IR generators to communicate a desire to align a
label on a specific 16 bit boundary.

however i would suggest simply looking at what RVC does and going with that.

> Can calls and return insns transition between modes?

this would require the definition of or the modification of an ABI that is
different from the standard ABI therefore the answer is a resounding and
emphatic no.

> Can code at the same address ever be executed in both modes? 

this whilst hypothetically possible would fall into the "foot shooting"
category as it would be statistically fantastically improbable.

the only instructions specifically designed with this in mind are nop and
illegal.

basically attempts to try this should be emphatically and strongly discouraged
and we are in no way going to try to design the ISA to be dual-mode (except for
illegal instruction, which is all zeros, so as to cause illegal instruction
traps as quickly as possible if a program goes wrong)

> E.g., when
> it comes to dynamic linkers, could there possibly be two GOT and PLT
> entries for the same address, one for each mode, when labels refer to
> the same address except for the mode?

you may be referring to the multi-compilation mode being championed by the
RISCV community (not repeat not the same thing as multilib)

multiple recompilations of the exact same function with multiple different
flags and ISA capabilities are embedded into the exact same object file and the
dynamic linker is expected, at runtime, to pick and fixup to the correct
**version** of that function.

these multiple versions all conform to the exact same ABI.

> Another concern is on tooling.  Though a compiler might not have too
> much trouble to figure out that a chunk of code fits the constraints
> that enable the use of the compact mode, that's not quite as useful or
> involved as having the compiler try to make the code fit the
> constraints.  Allocating registers for the constrained register profile,
> reordering code so as to move unfit operations out of the fragment, or
> falling back to alternate operations, scheduling instructions according
> to the execution properties of each mode...  These don't seem to fit
> very well in the current compilation model.

RISCV deals with it, therefore we expect the required code and techniques to
exist and to be adaptable to a Compressed OpenPOWER ISA.

> I don't wish to come across as too negative, but it makes me wonder if
> the potential savings this feature will enable won't just go to waste
> because the tools won't be able to take advantage of them.  I wonder if
> it wouldn't make more sense to save this idea for a future development,
> rather than in the critical path for the very first product.  

we have to start somewhere.  if we never start no innovation takes place.

> I can see,
> however, how much of a breakthrough it can be, and how compelling it can
> make the processor, if the potential is realized.

indeed.  and given that RISCV has done (is doing) exactly the same thing, they
have teams of experts being paid huge sums of money in the order of around a
million dollars a year estimated to sort this out.

we simply have to wait for the code to hit mainline gcc and llvm and learn from
it.

also given the compelling code and power reduction i expect some uptake from
IBM. this may take a while and require at least proof of concept, demonstrable
precedent, and so on.

-- 
You are receiving this mail because:
You are on the CC list for the bug.