[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Mon Jan 21 12:24:33 GMT 2019

On Mon, Jan 21, 2019 at 11:41 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Mon, Jan 21, 2019, 02:42 Luke Kenneth Casson Leighton <lkcl at lkcl.net
> wrote:
>
> > ---
> > crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> >
> > On Mon, Jan 21, 2019 at 10:23 AM Jacob Lifshay <programmerjake at gmail.com>
> > wrote:
> > >
> > > I pushed a work-in-progress prefix proposal to
> > >
> > https://salsa.debian.org/Kazan-team/kazan/blob/master/docs/Prefix%20Proposal.md
> >
> > looks great;
> >
> > elwidth
> >
> > 11 64-bit (d)
> >
> > can you make that "default" (in both int & fp), so that:
> >
> I'm reusing the encoding from ld/st and from fp arith so can't change the
> order since I'm using the pre-existing fields to encode elmsz.

 part of the pre-existing instruction: ok, yes, so that can't change.
then we are talking about different things.  the purpose of the
elwidth in the Reg CSRs is to override the default [pre-existing]
encoding *and* to indicate that the register file is to be treated as
a type-casted contiguous SRAM as far as elements are concerned.

 so the "register" (standard register) is merely a convenient address
pointer into that SRAM, and, crucially, the REMAP "offset" is added to
that [type-casted] SRAM address, in order to be able to access high
bytes/halfs/words of 64-bit "registers".

 if an elwidth over-ride is *not* to be included in the prefix, then
it becomes extremely complex, as there are four cases to consider:

* default, scalar (standard RV behaviour)
* non-default, scalar (IGNORES HIGH ZERO/SIGN BITS)
* default, elements (RESPECTS HIGH ZERO/SIGN BITS)
* non-default, elements (COMPACTS MULTIPLE ELEMENTS INTO 64-bit "registers")

if there does not exist an elwidth over-ride, it becomes extremely
complex software-wise to interact between the vector and scalar
"worlds".  xBitManip, &'ing, |'ing and shifts all become necessary in
order to "extract" and place data into and out of vectors: precisely
the things we wish to avoid doing.

> >
> >  (a) when using the CSR tables, the "default" may be over-ridden
>
> we can set it so that the csr tables overrides the 64.

 that's not a good idea.  that changes the meaning of the 16 and 32
bit opcodes.  as in: you actually have to emit completely the "wrong"
opcode.

 right now, the opcodes do not actually change "meaning".  that's been
a critical and fundamental tenet of SV: you *do not* change the
meaning of the standard RV opcodes.  there's a couple of places where
that rule is bent a little (BEQ/BNE), however it's not in such a
drastic way.

> >  (b) the elwidth can be taken from the opcode (and the usual
> > zero/sign-extension done: this is crucial for LD/ST)
> >
> the ld/st elwidth is taken from the opcode. I designed it so ld/st always
> loads/stores the same width as the register has, so I can reuse the
> zero/sign bit for the msb of ld_st_kind.

 ... that's changing the meaning of LD/ST *and* changes the encoding.
i very specifically avoided doing that, as one of the design goals of
SV, particularly as it complicates the decoder phase (conditional
decoding, based on the prefix)

> >  (c) RV32/RV64 mode can do the "normal" changing that is part of the RV
> > spec.
>
>
> > the signed/unsigned table is not needed: i did a comprehensive
> > analysis as part of spike-sv which shows that the sign/zero extension
> > may be taken from whether the instruction has a ".W" suffix.  dead
> > simple.
> >
> Yup, except I'm using that as elmsize[1], so I need another bit.

 that doesn't make any sense: it's effectively just moving one bit
from the standard ".W" location into elmsize[1].  and overcomplicates
the instruction decoder phase:

 if (prefix mode) sign_mode = elmsize[1] else sign_mode = {standard location}

> > the c.mv sign/unsigned: this makes me nervous (especially after
> > analysing how complex in the hardware it is to do).  it turns the
> > processor into a CISC design that effectively requires turning the
> > instruction into two microcode ops.  and it's element-based.
> >
> We can change the conv instruction to only be valid to/from 32-bits.
>
> everything in prefixed C is still in flux, so that can change.
> I think we should establish a working 48-bit encoding and then pick the
> most commonly used ops, since I'd like prefixed C to be similar to C in
> that every C instruction expands 1:1 to a 48-bit instruction and C is just
> used to make them smaller.

 that's a really good goal to have, i like it.

 bear in mind though that this absolutely requires a "holistic"
approach.  i found that one single thing out-of-place or overlooked
can throw the entire scheme into chaos and require a redesign.

 the point being, there: the prefixed-C *needs* to be considered at
the same time as the 48-bit version.  they cannot be done separately.

> Note that since C is almost entirely used already and basically no new
> extensions will redefine it, we can redefine the prefixed version of C
> however we want (basically new 32-bit instructions inspired by C, rather
> than prefixed C instructions).

 iiinteresting.... i like where that could lead.  let's see how far
that gets, bear in mind that it means a heck of a lot of compiler
work.  it basically means creating a fork of gcc and binutils, for
example.  however what's good is, it's not changing the existing
standard RV (32-bit or 16-bit) meanings; it's just additions.

l.