[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat Jan 26 02:17:17 GMT 2019


On Saturday, January 26, 2019, Jacob Lifshay <programmerjake at gmail.com>
wrote:
>
> > Really, I prefer 2 for 32bit int ops and all C ops, that way it's always
> > possible to specify 8/16/32/default.
> >
> 32-bit int ops can have a single bit that switches from 32/default
> (OP-32/OP) to 8/16.


I know there is something odd with the OP32 stuff in RISCV, it was
discussed a year ago. The opportunity to have RV32 executables run
unmodified on an RV64 host (in RV64 Mode) was lost because the opcodes
actually change meaning depending on whether RV32 or RV64 Mode CSR is set.

I would feel comfortable with only a single bit to set 8/16 to OP32 / OP
only after doing a full walkthrough.


> We can have a different prefix encoding for C ops to have the required 2
> bits, or, since they should expand 1:1 to 32-bit ops, we can just have
> combinations for the most common prefixes, requiring full instructions for
> uncommon cases.
>
>
Makes sense


> > Crucial strategic op missing is MVX:
> > regs[rd]= regs[regs[rs1]]
> >
> we could modify the definition slightly:
> for i in 0..VL {
>     let offset = regs[rs1 + i];
>     // we could also limit on out-of-range
>     assert!(offset < VL); // trap on fail
>     regs[rd + i] = regs[rs2 + offset];
> }
>
> The dependency matrix would have the instruction depend on everything from
> rs2 to rs2 + VL and we let the execution unit figure it out.


O yuk! :) ok so if the following instructions use registers that are
outside the bounds of rs2..rs2+VL the instruction issue phase may proceed...

And as it does not need reading the regfile to do that calculation...

Smart!


>  for
> simplicity, we could extend the dependencies to a power of 2 or something.
>
>
Yes.


> >
> > However this is a pig to implement in hw, when it becomes parallel, even
> > more so. I did however come up with a schroedinger scheme for
> predication,
> > the predicated ops are allocated to ALUs, which depend on a special
> > predication FU and hold a write hazard.
> >
> > When the predicate is free to be read by the special PrFU, it sends
> either
> > "die" or releases the write hazard line.
> >
> > I think same thing can be done for MVX. Split into 2 phases (2 FUs), one
> > which reads the regfile, &s with 0x7f (whatever), then passes that
> through
> > to 2nd phase to look up in regfile.
> >
> > Only thing is, damn, it messes up the dependencies. You can't proceed
> > further with instruction issue (not to an OoO engine) until all of those
> > 2nd phase regfile lookups are known.
> >
> mvx is a last resort instruction. We definitely need it because we can
> implement it in HW to be up to several times faster than the fallback
> (bunch of st/ld or bunch of scalar mv) and much less instruction issue
> bandwidth and energy than the fallback.
>

agreed.  don't like it: the constrained/relative on is... tolerable (the
hardware design is going to be a dog's dinner mess.... *sigh*)

We should add some constrained swizzle instructions for the more
> pipeline-friendly cases. One that will be important is:
> for i in (0..VL) {
>     let i = i * 4;
>     let s1: [0; 4];
>     for j in 0..4 {
>         s1[j] = regs[rs1 + i + j];
>     }
>     for j in 0..4 {
>         regs[rd + i + j] = s1[(imm >> j * 2) & 0x3];
>     }
> }
>

i take it 0..4 means actually 0,1,2,3?  and 0..VL means 0,1,2.... VL-1?


> Another is matrix transpose for (2-4)x(2-4) matrices which we can implement
> as similar to a strided ld/st except for registers.
>
>
recorded in the microarchitecture notes so we don't lose track.


> Note that all of the above operations should be operating on elements, not
> registers.
>
>
understood / agree.

l.


More information about the libre-riscv-dev mailing list