[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Tue Jan 22 21:59:45 GMT 2019

On Tue, Jan 22, 2019 at 9:33 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Tue, Jan 22, 2019, 04:35 Luke Kenneth Casson Leighton <lkcl at lkcl.net
> wrote:

> > What are "broadcast" instructions?
> >
> a scalar to vector move, also known as a splat.
> broadcast.d x32, a0, len=VL
> is equivalent to
> for(i = 0; i < VL; i++)
>     x[32 + i] = a0;

 ok cool, i recognise the term "splat" from RVV discussions.

 one option worth investigating: use the lower-numbered registers
(x0-x31) to indicate implicitly that they are to be scalar.  i.e. when
the extension prefix (for either src or dest) is 0b00, it indicates
"these are scalar not vector".

> > > but may need to change the encoding for
> > > the more common operations to accommodate vector-scalar modes for
> > > power-efficiency and lower register pressure.
> >
> >
> > Changing the encoding has huge software implications.
> >
> Yeah, but they are easily manageable at this stage of the process since we
> haven't started the instruction decoder or compiler. Since we are only
> changing how the prefixed instructions are encoded, it shouldn't change
> much with spike, since we will still need to add decoding the prefixed
> instructions anyway.

 i'm not sure if you're aware how i implemented spike-sv, or how
crucial the strategic goal of not modifying RV base is, and how that
influenced how SV was designed.

 in spike-sv there *are* no changes to the decode phase.  at all.
aside from element width over-rides (which are done in a "global"
overview fashion) there's absolutely no changes to the meaning of the
spike emulated-instructions compared to their scalar variants, either,
with one exception that's handled in an "overview" (modular-like)
fashion, and that's branch-compare operations.

 that means that in turn there are absolutely no changes - whatsoever
- to binutils.  absolutely none.

 the modifications to add element-width overrides were done through
turning various critical strategic macros (zero and sign extension in
particular) into functions, that were added to a c++ class, that then
"redirected" their arguments through a processing phase.

 i *did not touch* the ALU side of spike.

i *did not alter* the decode phase *at all*.

consequently i was able to complete spike-sv within i think it was
around 6-8 weeks.

> > > We could use the prefixed jal encoding as a different opcode for
> > > vector/scalar as jal is useless when vectorized.
> >
> >
> > >
> > There are a ton of non-vectoriseable ops, the problem is that there are
> > nowhere near enough.
> >
> > I like the idea, the problem is that the entire vectoriseable opcode space
> > needs to be fitted into the overloaded space.
> >
> It all still fits, we're just reassigning the non-vectorizeable portions.

 the reassignment *is* a huge step in and of itself (which has me
concerned as to the cost of development of the associated
modifications to llvm, gcc and binutils), and i'm not sure if we're
understanding correctly.

 copies of the entirety of the RV opcodes - those which are to remain
scalar - need to be made.

 so how can the entirety of the RV opcode space - around a hundred
instructions - fit into a few (reassigned) opcodes?

 or, were you envisioning only doing a few opcodes?  or some new ones?

> You can look at it this way: we can always run any op using broadcast then
> the vector-vector version, so if we make the most common ops have a
> vector-scalar mode, then that is similar to C in that it saves space, time,
> energy, etc. but it doesn't make the underlying vectorized operation more
> or less possible, since you can always use broadcast instructions to
> convert any scalar inputs to vectors then run the vector-vector version.

 ok so a 2-step process?

> >
> > Plus all future opcodes.
> >
> all future opcodes that use the OP, OP-32, or FP opcodes will have
> vector-scalar versions (includes at least part of the B extension), all
> others (assuming we don't reassign more) will have only vector-vector
> versions and will need a broadcast for scalar args.

 do you mean, we have to make a broadcasting opcode which takes scalar
ops and broadcasts them to vector destination regs?

 i think the idea of setting x0-x31 as implicitly being scalar (as
source or dest, i.e. when the extension prefix = 0b00) would achieve
the same thing.

> > So by overriding the jal and other space, the implications are as follows:
> >
> > * the rule about not modifying opcodes has been discarded, which implies a
> > massive amount of compiler work
> >
> not much more than would be otherwise needed since (from what I recall) all
> the RISCV instruction selection code only works with scalars right now
> anyway.

 if there are _any_ it automatically means maintaining a temporary
fork of gcc, binutils and llvm.  that means that resources have to be
committed to convincing upstream developers to accept the patches.

 that in turn requires a full audit and review process, and if they
don't like them, and it's too late, resources have to be committed
instead to a *permanent* hard fork of gcc, binutils and llvm.

> > * the RV vectoriseable opcode space in no way fits in full.
>
> * the decode of jal (and other ops) now just got much more complex in
> > hardware.
> >
> slightly more complex, basically when we see a prefixed jal, when we remove
> the prefix to decode the underlying instruction, we replace the opcode
> field with OP-32. Nothing more than that (other than having scalar instead
> of vector arguments, of course).
>
> >
> > Remember that any modifications to the compiler toolchain have to be paid
> > for sonehow.
> >
> Yup, but we're doing that anyway (assuming you don't want to just use RVV
> and wait for someone else to implement that).

 (a) there's a difference in the level of modifications required:
adding new opcodes and changing the encoding is a whole new ballgame.

 (b) RVV hasn't been designed with video or 3D in mind.  it could be
years before RVV-xBitManip is added (if ever).  we cannot wait around
*hoping* that they *might* consider adding such capabilities, given
that there's no way to communicate with them.

l.