[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Fri Jan 25 12:45:37 GMT 2019

On Friday, January 25, 2019, Jacob Lifshay <programmerjake at gmail.com> wrote:

> On Fri, Jan 25, 2019 at 2:24 AM Luke Kenneth Casson Leighton <
> lkcl at lkcl.net>
> wrote:
>
> > hiya jacob, ok so i had a couple of days to think.
> >
> > my main concern about modifying encoding of RV instructions is that SV
> > then becomes a dead-end as far as wider adoption is concerned, due to
> > the violation of the rule that "no instructions shall need
> > modification to be made parallel".  people considering adopting SV on
> > custom (or future) extensions may not have the same operands or types
> > of operands / modifiers, and the lack of clarity and simplicity makes
> > them stay away.
> >
> Yeah, I was starting to get a little concerned about that, the prefix
> proposal doesn't exactly have a consistent pattern as it is. I think I went
> a little overboard with the different prefix formats.
>
>
:) if you hadn't, there would be nothing to compare against / evaluate.

> >
> > also: if we create what is effectively a new encoding, we might as
> > well stop entirely on SV, and implement RVV plus some custom
> > RVV-xBitManip extensions.  it would be a lot less work, particularly
> > software-wise.
> >
> Yeah, leave that for a backup plan if SV turns out to not work or we run
> out of time or something.
>
>
Agreed.

> >
> If we switch the 2-bit scalar/vector to 1-bit scalar/vector-mod-4, we will
> have more bits left over.

True. Concern / issue: mod4 means 32bit elements are actually 8 per 64bit
group.

Not sure what to do about that.

> Assuming we do implement scalar/vector-mod-N we should use moving the MSB
> to the LSB to save muxes, like the rest of RISC-V, rather than shifting:

Huh, how about that. Always wondered why tge encoding was so weird.

>         // sends abcde to deabc00
>         ((reg_num as u7 & 0x3) << 5) | (reg_num as u7 & 0x1C)

Who knew :)

> We may want to define a slightly different transformation for C
> instructions, to allow the 3-bit register fields to be the most useful.

Same trick, yes agreed.

>
> Also, I think that we need to not have seperate source and dest elwidths
> except on mv/conv since we can then dedicate a clock cycle to the type
> conversions rather than trying to pack it in every operation.

>
Ah yes very good point. That would mean elwidth stays same for src and
dest, not so wasteful on routing side.

Hmmm.... hmm... however if routing is needed for mv/conv, it could be used
for 2src ops too. Except... 2src ops means double the routing bandwidth
(and width conversions) or a clock cycle penalty.

Complex.

Prefer the suggestion you made. Single src conversion to single dest.

That means 4 prefix bits freed for ALU ops. FP ops have 16/32/64/128 as
part of RV encoding already (except C). INT ops are weird.

I think save at least 1 bit for doing something for C ops and also 32 bit
int. That way it may be possible to at least get C ops to do 32 bit FP
elements (they can't right now except by setting RV32 Mode)

Really, I prefer 2 for 32bit int ops and all C ops, that way it's always
possible to specify 8/16/32/default.

>  We would
> still have all the different elwidths but an op would have the same elwidth
> out as in (except for mv/conv). I don't think type conversion is going to
> be common enough to use an extra 2-bits and maybe an extra pipeline stage
> or two to allow conversion for every instruction.

Concur.

>
> I think we should try for a 5-bit vlp since that way, we can have more
> predication registers and all 4 most common (1..4) VL-multipliers together.

Not so concerned about more predicates, more VL multipliers sounds sensible.

> From what I recall, the LLVM variable length vectors will support something
> like VL multipliers, so it will make it easier to compile for as well.

Oo interesting.

>
> >
> > that would leave 2 bits spare which could be used for more
> > operation-specific uses such as LD/ST behaviour.
> >
>
> > what do you think?
> >
> Yeah, sounds good. If we don't have enough for LD/ST, we can always add
> custom instructions (not by abusing the prefix system).

Crucial strategic op missing is MVX:
regs[rd]= regs[regs[rs1]]

However this is a pig to implement in hw, when it becomes parallel, even
more so. I did however come up with a schroedinger scheme for predication,
the predicated ops are allocated to ALUs, which depend on a special
predication FU and hold a write hazard.

When the predicate is free to be read by the special PrFU, it sends either
"die" or releases the write hazard line.

I think same thing can be done for MVX. Split into 2 phases (2 FUs), one
which reads the regfile, &s with 0x7f (whatever), then passes that through
to 2nd phase to look up in regfile.

Only thing is, damn, it messes up the dependencies. You can't proceed
further with instruction issue (not to an OoO engine) until all of those
2nd phase regfile lookups are known.

Reason: only when all the 1st phase regfile lookups are known do you know
which hazards need to be created in the Dependency Matrices.

It would be much easier to have REMAP/SHAPE, as that does not involve
creating a 2 phase decode that blocks even the instruction decode phase.

If only 1 reg in the proposed new op contained the map, something similar
to xbitmanip butterfly or  REMAP permutations, at least just like for
predication the instruction decode phase would be held up waiting for only
1 reg read, not VL reg reads.

> Jacob Lifshay
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68