[libre-riscv-dev] sv swizzle constants
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Jun 26 04:49:59 BST 2019
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Tue, Jun 25, 2019 at 10:17 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Tue, Jun 25, 2019, 12:48 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > Ok so a suite of constants (immediates) are needed, to be able to specify
> > the permutations of up to 4 subvector elements, x y z w.
> > That is 4 x 3 x 2 x 1 permutations which is 24 so 5 bits are needed to
> > express them, per register.
> it's actually 4 x 4 x 4 x 4, since repeats are allowed --
argh! that's a whopping 16 bits! oh wait, it's only 8, sorry i was
thinking that was unary. 2 bits per position. 2x4=8.
> I even had a repeat in the example for velswizzle.
sorry, i miss things (in new concepts) unless they're explicitly spelled out.
However, when you use a swizzle as a way of setting component values,
you cannot use the same swizzle component twice. So someVec.xx =
vec2(4.0, 4.0); is not allowed.
when i read that the first time i thought that was for the source regs as well.
nggggh 8 bits is just about tolerable.
> For SVorig, swizzle should only be supported on mv -- supporting it on
> everything will drastically increase pipeline complexity for every op --
> similar to ARM putting the barrel shifter in every instruction -- not
> necessary for the common case.
hmm hmmm.... let me think... register numbers in the subvl loop have
to go through a 2-in, 2-out redirection...
taking too long to think about it: ok. MV only it is.
> For SVprefix, I suggest using 8 or 12 immediate bits to encode that using a
> dedicated opcode.
separating SVprefix and SVorig is not a good idea. if an opcode gets
added to SVprefix, it gets added to SVorig as well.
reason: conceptually, the instruction gets added as a *scalar*
instruction.... and then the parallelism is added as a completely
otherwise it's "compiler hell time".
i think there's a way to shoe-horn opcodes into SVprefix: use bit 60
of P64 to indicate "different encoding".
or, alternatively, just use a 32-bit custom opcode (a custom MV),
that's enough. if we absolutely have to use more bits (add up to 15
more bits to the custom-MV):
* bit 60 of P64 can be flipped to "1"
* the P48 prefix stays as-is
* we "lose" the extended register and VLtype of P64 (which is ok)
* bits 48-59, 61 62 and 63 can be added to extend the custom MV and
still keep P48 capability
given that it only takes 8 bits to specify all swizzle options, and
given that it's not necessary to have swizzle on *both* src *and*
dest, that's a not unreasonable deal.
or, hmmm... darn it, MV doesn't actually exist in RV, does it? it
used to, but was removed in favour of a pseudo-op using add.
we've got the op-v major opcode 0b1010111, and the 4 32-bit custom
opcodes to play with. that means we can use a funct3 for MV, then if
we use an I-Type, the 8 bits needed for the swizzle will fit into the
hmmm i'll drop it into
because the formatting from gmail is plain-text on this message. grr.
hmmm... actually, we could use the same opcode for MV.X. hmm maybe
not, don't fully like the idea of having to use e.g. the immediate as
a partial funct7, partial immediate, although it's not completely out
of the question if really pressed for space, and there's precedent:
funct5 in the FP format.
question: do we need swizzle *and* MV.X in the same instruction?
> 12 bits (3 per element) allows us to encode often-used constants as swizzle
> inputs or the similar to LLVM's 2-input swizzle (select between options by
> using x0 as rs2).
> for the constants, I recommend:
> 4. +0
> 5. +1
> 6. -1
> 7. index of current subvector (int) / reserved (float)
it's only 8 bits needed, were you thinking it was 16 needed as well?
from 4+4+4+4? that's unary :)
> the numbers are encoded using the eltype, so a swizzle using integer types
> gets the values as integers and the float types gets the values as float
i'm lost. are you thinking of a vectorised swizzle MV that takes the
swizzle from a *vector* register rather than an immediate, here? can
it's just data: the type of the data should definitely be unaffected
(unaltered). ahh hang on, we may need an FMV-swizzle, as you need to
be able to specify which regfile to do the swizzling on.
> index of current subvector works like:
> VL = 4
> velswizzle x32, x48, x0, SRCSVLEN=1, DESTSVLEN=4, ELTYPE=u64, elements=[+0,
> +1, -1, index]
source *and* dest SUBVL? eek! :) is that _really_ necessary? as in:
how about just doing two separate MVs, one for the src, one for the
that initially sounds like it would require 3 MV operations (move the
result back into the required format, after the operation). however
with permutations, there's a way to map them. two same-sized
permutations one after the other is... one permutation.
More information about the libre-riscv-dev