[libre-riscv-dev] [isa-dev] 3D Matrix-style operations / primitives

Jacob Lifshay programmerjake at gmail.com
Wed Sep 18 08:41:33 BST 2019


On Wed, Sep 18, 2019, 00:20 lkcl <luke.leighton at gmail.com> wrote:

>
>
> On Wednesday, September 18, 2019 at 8:00:56 AM UTC+1, lkcl wrote:
>
>
>> No I don't think it does.  The assumption is that the element indices are
>> always XLEN wide and it is the data to which elwidth applies in the src.
>>
>> Have to fix that by adding a fmt field to MV.X let me just update the
>> page]
>>
>
>  The idea here is to allow 8-bit indices to be stored inside XLEN-sized
> registers, such that rather than doing this:
>
> .. parsed-literal::
>     ldimm x8, 1
>     ldimm x9, 3
>     ldimm x10, 2
>     ldimm x11, 0
>     {SVP.VL=4} MV.X x3, x8, elwidth=default
>
> The alternative is this:
>
> .. parsed-literal::
>     ldimm x8, 0x00020301
>     {SVP.VL=4} MV.X x3, x8, elwidth=8
>
> Thus compacting four indices into the one register. x3 and x8's element
> width are *independent* of the MV.X elwidth, thus allowing both source
> and element element widths of the *elements* to be moved to be over-ridden,
> whilst *at the same time* allowing the *indices* to be compacted, as well.
>

mv.x with 8-bit indexes sounds like a good idea.

assuming a vector of 4x4 matrixes is stored as 4 separate vectors with
subvl=4 in struct-of-array-of-struct form (the form I've been planning on
using):
using standard (4+4) -> 4 swizzle instructions with 2 input vectors with
subvl=4 and 1 output vector with subvl, a vectorized matrix transpose
operation can be done in 2 steps with 4 instructions per step to give 8
instructions in total:

input:
| m00 m10 m20 m30 |
| m01 m11 m21 m31 |
| m02 m12 m22 m32 |
| m03 m13 m23 m33 |

transpose 4 corner 2x2 matrices

intermediate:
| m00 m01 m20 m21 |
| m10 m11 m30 m31 |
| m02 m03 m22 m23 |
| m12 m13 m32 m33 |

finish transpose

output:
| m00 m01 m02 m03 |
| m10 m11 m12 m13 |
| m20 m21 m22 m23 |
| m30 m31 m32 m33 |

alternatively, if loading or storing, the transpose can be combined with
the load/store instructions to give 4 instructions (strided load/stores
might work, haven't checked)

Jacob


More information about the libre-riscv-dev mailing list