[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Thu Oct 10 09:15:31 BST 2019


--- Comment #68 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #62)
> ok i have another idea: instead of the immediates being 12 bits, they're
> 8, *but*, there is *one* separate bit (in funct3?) which indicates
> whether the swizzles are indices xyzw or constants {0,1,1/2,pi}.
> if destmask is fitted into the top 4 bits, that gives the ability of
> swizzlei to reach the upper elements:
>     fmv.swizzlei rd.mask{0bw0x0} = rs.zw
> or:
>     fmv.swizzlei rd.mask{0bwz00} = {pi, 1.0}

the swizzles that use constants (other than just for a load-immediate) would
usually be of the form [x, z, 0.0, 1.0] where both input elements and constants
are specified at the same time.

> it does mean however that four funct3s are needed (where with what
> you came up with, jacob, there are three for swizzlei).
> however what is gained instead is the ability for swizzlei to set
> upper elements (beyond the straight sequence).
> that in turn saves having to use an extra register - rs2 - in swizzle.
> which would still need to be set up (loaded with an immediate).

That's true.
alternatively [f]swizzlei could be left alone, and separate [f]swizzle2i
instructions be added:

swizzle2i rd, rs1, swizzle
is equivalent to
li rtemp, swizzle
swizzle2 rd=rd, rs1=rs1, rs2=rtemp, rs3=rd

Swizzle would be defined to read all of each subvector before writing the
corresponding subvector to rd, tiny implementations can implement that by using
a temporary register to store the read elements, still allowing operating on
one element at a time. swizzle can detect all traps at the beginning, before
writing anything, allowing subvector swizzles to not have to worry about the
temporary register needing to be accessible for context-switching.

If there's enough free space in the [f]swizzlei encoding, some of it could be
shared with [f]swizzle2i to save opcode space.

If we reuse JAL and other similar opcodes (SYSTEM? JALR?), we should have
enough spare opcode space to be able to allocate that much space to swizzle due
to its importance.

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-riscv-dev mailing list