[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided
bugzilla-daemon at libre-riscv.org
bugzilla-daemon at libre-riscv.org
Wed Oct 9 17:38:16 BST 2019
http://bugs.libre-riscv.org/show_bug.cgi?id=139
--- Comment #64 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
I worked out why Midgard has a dest mask and allows swizzle on both src
operands: it allows FULL arbitrary ALU selection, covering all possible
permutations of dest xyzw and src1 src2 xyzw.
The reason is that the dest needs only to know which elements are to be skipped
ie the sequence is maintained, and *both* src element permutations may be
reordered to match that sequence.
Also, again, no DESTSUBVL, that is implicit in the number of bits in the
destmask and defines how many elements are copied from the two srces.
The swizzles in Midgard and the destmask are all immediates, not in regs
(solving compiler error concerns), and the total number of bits is 8+8+4 = 20.
No wonder GPU ISAs are enormous.
Well, apart from destmask, VBLOCK can handle this case. Or, destmask has to be
specified as a full 8 bit swizzle, not a 4 bit predicate-like bitmask.
That deals with the ALUs, although it leaves SVP with a "hole" (no equivalent
in SVP to VBLOCK swizzle capability, at the moment).
If we go with the MV.swizzle OP32 format that you designed, that gives a way to
cut down on opcodes used, by reordering data once and be done with it. It does
mean having copies of data, which uses up regs.
However if registers are ever under pressure, VBLOCK can be used to refer to
the data in-place, just like in Midgard.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list