[libre-riscv-dev] [Bug 143] REMAP CSR for Matrix Multiplies
bugzilla-daemon at libre-riscv.org
bugzilla-daemon at libre-riscv.org
Mon Oct 7 13:01:11 BST 2019
http://bugs.libre-riscv.org/show_bug.cgi?id=143
--- Comment #1 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
Apologies I hadn't realised quite how important swizzling really is.
https://libre-riscv.org/simple_v_extension/vblock_format/#swizzle_format
I have been looking at the PLX 3D paper and it contains an algorithm for 4x4
matrix times 4x1 vector.
That algorithm is:
fmul f2, f1.xxxx, f10
fmac f2, f1.yyyy, f11, f2
fmac f2, f1.zzzz, f12, f2
fmac f2, f1.wwww, f13, f2
VBLOCK swizzle table format can cope with this in a single block by setting a
swizzler onto four registers that are *redirected* to f1, each with a different
swizzle setting.
Macro op fusion would result in *doubling* the number of instructions.
Both are not ideal.
For this particular case however I am inclined to review the decision to put
the REMAP CSR on the back burner.
https://libre-riscv.org/simple_v_extension/remap/
These were intended for Matrices, however I forgot about them after thinking
that Vector Mul was not as high a priority.
Swizzle looks to be extremely awkward and costly, making the REMAP CSRs
attractive by comparison.
With the right REMAP, setting
* SHAPE1 to operate on a 4-element continuous loop and attached to f2
* SHAPE2 to wait 4 elements before incrementing by 1, and attaching to f1
the Matrix Multiply is LITERALLY reduced to 2 instructions, one of which is to
clear out f2 to 4 zeros, the other is an FMAC with a VL of 16 (no SUBVLs).
VL could be set with an SVP-64 instruction, no need to set up a VBLOCK.
The alternative is to add REMAP to VBLOCK.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list