[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Tue Jan 22 21:33:02 GMT 2019

On Tue, Jan 22, 2019, 04:35 Luke Kenneth Casson Leighton <lkcl at lkcl.net
wrote:

> On Tuesday, January 22, 2019, Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > On Tue, Jan 22, 2019, 03:04 Luke Kenneth Casson Leighton <lkcl at lkcl.net
> > wrote:
> >
> > > On Tue, Jan 22, 2019 at 8:08 AM Jacob Lifshay <
> programmerjake at gmail.com>
> > > wrote:
> > >
> > > > Changes to the proposal:
> > > > I changed where elmsz comes from in integer size conversions, using
> > > > OP-IMM/OP-IMM-32 to differentiate between unsigned/signed.
> > > > I split the elmsz mapping into elmszd/elmszw to allow reordering to
> > match
> > > > the unprefixed sizes.
> > >
> > >  the thing about elwidth is that there needs to be one for the source
> > > operands and a *separate* one for the destination register
> > >
> > >  otherwise, explicit operations are needed which perform
> > > width-conversions.  which is feasible: special over-ride on MV / F.MV.
> > >
> > Width conversions are handled by conv for int to int, fcvt for fp to fp
> and
> > int to/from fp.
>
>
> The modes required are src dest elwidth, plus sign/zero extension, plus
> truncation, plus when no width overide is in place, plus Vector Scalar
> options, plus predication. If you have a look at the SV spec, the LD/ST
> section, and the MV section, it covers how all of those can be done.
>
>
> >
> >  also, how is vector-scalar and scalar-vector to be specified?
> > >
> > We will have broadcast instructions
>
>
> >
> What are "broadcast" instructions?
>
a scalar to vector move, also known as a splat.
broadcast.d x32, a0, len=VL
is equivalent to
for(i = 0; i < VL; i++)
    x[32 + i] = a0;

>
>
> > but may need to change the encoding for
> > the more common operations to accommodate vector-scalar modes for
> > power-efficiency and lower register pressure.
>
>
> Changing the encoding has huge software implications.
>
Yeah, but they are easily manageable at this stage of the process since we
haven't started the instruction decoder or compiler. Since we are only
changing how the prefixed instructions are encoded, it shouldn't change
much with spike, since we will still need to add decoding the prefixed
instructions anyway.

>
>
> > We could use the prefixed jal encoding as a different opcode for
> > vector/scalar as jal is useless when vectorized.
>
>
> >
> There are a ton of non-vectoriseable ops, the problem is that there are
> nowhere near enough.
>
> I like the idea, the problem is that the entire vectoriseable opcode space
> needs to be fitted into the overloaded space.
>
It all still fits, we're just reassigning the non-vectorizeable portions.

You can look at it this way: we can always run any op using broadcast then
the vector-vector version, so if we make the most common ops have a
vector-scalar mode, then that is similar to C in that it saves space, time,
energy, etc. but it doesn't make the underlying vectorized operation more
or less possible, since you can always use broadcast instructions to
convert any scalar inputs to vectors then run the vector-vector version.

>
> Plus all future opcodes.
>
all future opcodes that use the OP, OP-32, or FP opcodes will have
vector-scalar versions (includes at least part of the B extension), all
others (assuming we don't reassign more) will have only vector-vector
versions and will need a broadcast for scalar args.

>
> So by overriding the jal and other space, the implications are as follows:
>
> * the rule about not modifying opcodes has been discarded, which implies a
> massive amount of compiler work
>
not much more than would be otherwise needed since (from what I recall) all
the RISCV instruction selection code only works with scalars right now
anyway.

> * the RV vectoriseable opcode space in no way fits in full.

* the decode of jal (and other ops) now just got much more complex in
> hardware.
>
slightly more complex, basically when we see a prefixed jal, when we remove
the prefix to decode the underlying instruction, we replace the opcode
field with OP-32. Nothing more than that (other than having scalar instead
of vector arguments, of course).

>
> Remember that any modifications to the compiler toolchain have to be paid
> for sonehow.
>
Yup, but we're doing that anyway (assuming you don't want to just use RVV
and wait for someone else to implement that).

>
> L.
>
>
>
>
> --
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>