[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

Thu Oct 3 18:38:42 BST 2019

http://bugs.libre-riscv.org/show_bug.cgi?id=139

--- Comment #13 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #11)
> Hmm ok so we need:
> 
> * LD unit strided (an immed)
> * LD element strided (covered by std LD)
> * LD index strided (covered by std LD)
> 
> and
> 
> * Vulkan swizzle, 0, 1, x, y, w, z per SUBVL-Group
> * Swizzle by element can kinda be done with MV.X
> 
> Do we need:
> 
> * strided swizzle
> * element strided swizzle
> * index strided swizzle

probably?

I think it would be better to separate the swizzle op from the ld/st op,
allowing the swizzle instruction to be reused for reg -> reg swizzle (requiring
only one swizzle ALU on small implementations) and larger implementations can
macro-op fuse the swizzle with ld/st. this will also reduce the opcode
proliferation to 12 or so (ld*, st*, swizzle) instead of the 20-30 that would
otherwise be needed due to n^2 proliferation.

so, for swizzle ops, I think we should have:
[f]swizzlei: i-type; swizzle is immediate
[f]swizzle: r-type; swizzle is rs2

also have (less of a requirement):
[f]swizzle2: r4-type; swizzle is rs2
[f]swizzle2i: 3 register with at least 12 immediate bits; swizzle is immediate
8 instruction matrix transpose algorithm depends on fswizzle2[i]

the swizzle is specified with 3*SUBVL bits (trap on nonzero unused bits,
allowing additional swizzle modes to be added later) encoded as:
000: rs1.x
001: rs1.y
010: rs1.z
011: rs1.w
100: rs3.x or 0 or 0.0
101: rs3.y or 1 or 1.0
110: rs3.z or int_max or -1.0
111: rs3.w or uint_max or 0.5 (or maybe pi or something else for fswizzle[i])

-- 
You are receiving this mail because:
You are on the CC list for the bug.