[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Wed Jan 23 09:36:19 GMT 2019

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Wed, Jan 23, 2019 at 9:02 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Wed, Jan 23, 2019, 00:06 Luke Kenneth Casson Leighton <lkcl at lkcl.net
> wrote:
>
> > ---
> > crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> >
> > On Wed, Jan 23, 2019 at 3:26 AM Jacob Lifshay <programmerjake at gmail.com>
> > wrote:
> > >
> > > On Tue, Jan 22, 2019 at 3:10 PM Luke Kenneth Casson Leighton <
> > lkcl at lkcl.net>
> > > wrote:
> > >
> > > > btw jacob i'm not saying "no" to modified instructions (non-RV),
> > > > especially given that the priority is llvm, where yes we have to
> > > > maintain a custom version of that: gcc and binutils can be deferred.
> > > >
> > > > what i'm saying is: if there's any other way (such as the implicit
> > > > 0b00-prefix => scalar idea) i'd prefer that to be prioritised.
> > > >
> > > Ok, what do you think about requiring the starting register for vectors
> > to
> > > be aligned by 2?
> >
> >  yeah that would work.  also, i noticed that the 48-bit prefix is
> > 7-bits long.  why?  that's a precious bit that could be used.
> >
> In case any other extensions want to have 48-bit instructions. Otherwise we
> are using the entire 48-bit address space.

 yep.  well, the way i envisaged that is: the rule about all ops
requiring to exist as scalar *first*, they will be a 32-bit scalar op,
therefore automatically there will be a vectorised version.  therefore
that 1 bit is being wasted.

> If you think we should use all
> of the 48-bit address space anyway, I recommend using the new bit to expand
> the vlp fields by 1 bit, allowing us to specify more predication and vector
> length combinations.

 i wanted to recommend an elwidth dest override (which would require 2
bits to do).

> We can use my proposal for requiring aligned register
> numbers to encode scalar/vector.

 yep i like it.

> Also, using the register number fields to
> encode scalar/vector allows us to have reduce operations by having a scalar
> destination. For operations where reduction doesn't make sense (fld for
> example), we just have the vectorization length be 1.
>
> For operations where a vector length multiplier is selected, scalar
> arguments/results are vectors of the length you'd get if the VL register
> were 1. This greatly simplifies vectorizing code like:
>
> vec4 color = ...;
> color = mix(0.5, vec4(1.0, 0.0, 1.0, 1.0), color);
> // mix color with magenta
>
> which would compile to:
>
> .rodata
> c0.5_0.0_0.5_0.5: ds 0.5, 0.0, 0.5, 0.5
> c0.5_0.5_0.5_0.5: ds 0.5, 0.5, 0.5, 0.5
> .text
> ; color in f32...
> li a0, c0.5_0.0_0.5_0.5
> fld.w f0(vector), a0, len=4
> li a1, c0.5_0.5_0.5_0.5
> fld.w f4(vector), a1, len=4
> fmadd.s f32(vector), f32(vector), f4(scalar), f0(scalar), len=VL*4

 *good*.  i like it.  that's the kind of thing that i really want to
see, as it gives hard concrete justifications.

l.