[libre-riscv-dev] [isa-dev] 3D Matrix-style operations / primitives

Wed Sep 18 17:26:01 BST 2019

I'm not reading all this in excruciating detail... but isn't matrix
transpose solved with vector gather?
The Livermore S1 had "hardware" transpose  op, but as I recall it was
always cacheline aligned.
This allowed them to effectively transpose arbitrarily large matrices by
exchanging transposed lines, which minimized the memory traffic.
I'd have to dig out my manuals, deep, deep in the garage to be sure of the
exact semantics (though I expect they're online somewhere).

On Wed, Sep 18, 2019 at 7:29 AM Rogier Brussee <rogier.brussee at gmail.com>
wrote:

>
>
> Op woensdag 18 september 2019 08:28:38 UTC+2 schreef Jacob Lifshay:
>>
>> On Tue, Sep 17, 2019, 22:24 lkcl <luke.l... at gmail.com> wrote:
>>
>>> does anyone know of some mathematics for analysing which would be the
>>> best "primitives" for Matrix operations, suited to transposition and
>>> inversion, determinant and normalisation?
>>>
>>> for 3D that generally means just 2x2, 3x3 and 4x4.
>>>
>>
>> Generally, for 3D graphics, matrix left/right-multiplication by
>> vector/matrix and transpose is by far more common than all the other matrix
>> operations combined, so having special HW support for inverse, determinant,
>> and normalize (never heard of that applied to matrices before) is probably
>> unnecessary.
>>
>>
> Linear algebra can be arbitrarily hard. Numerical linear algebra is very
> hard in practice. The canonical primitives are in the BLAS Fortran library.
> That said THE primitive is dot product between vectors.
>
>
>>
>>> i'm looking up how matrix inverses are calculated and, hoo-boy :)
>>> https://integratedmlai.com/matrixinverse/
>>> https://www.wikihow.com/Find-the-Inverse-of-a-3x3-Matrix
>>>
>>>
> NOOH to do do matrix inversion (which you basically never want to do) you
> use the QR decomposition of the matrix i.e. write A = QR with Q an
> orthogonal matrix and R upper triangular which can be chosen with non
> negative elements on the diagonal.
>
>  Then
> A^{-1} = R^{-1}Q^{-1} = R^{-1}Q^t.
>
> Computing a QR decomposition is a O(n^3) operation, and doing it
> numerically stable is well studied but difficult.
>
>
>
>> normalisation looks to be just "divide by the determinant":
>>>
>>> http://mymathforum.com/linear-algebra/18218-how-do-i-normalize-matrix.html
>>>
>>>
> NOOOH. She divides every column by the length of the column.
>
>> so... i am logically deducing that if you wanted something RISC-like,
>>> you'd have operations for transpose and determinant?  or... can determinant
>>> be broken down further into something elegant?
>>> https://en.wikipedia.org/wiki/Determinant
>>>
>>>
> NOOOH.  Determinants are very slick theoretically, but seldom useful for
> computations unless the matrix has special form.
>
> whilst 2x2 looks pretty obvious - 0,0 x 1,1 - 1,0 x 0,1 - it looks like it
>>> goes recursive from there, with permutations:
>>> https://en.wikipedia.org/wiki/Determinant#n_%C3%97_n_matrices
>>>
>>> at that point, honestly, i'm scared/alarmed to even recommend a Matrix
>>> Determinant opcode!
>>>
>>
>> yeah, determinant is an O(n^3) operation (though there may be something
>> faster for really big n)
>>
>
> No it really is horrible, O(n^4). You have to basically compute the QR
> decomposition  then det(A) = det(QR) = det(Q)det(R) = det(R) = +/-1 * prod
> diagonal elements of R.
> and I am not even sure how you can track the sign of the determinant while
> decomposing.
>
> Of course n =2 is really easy, and n= 3 can easily but tediously expanded.
>
>
>>> likewise, Transpose appears to be a series of MV operations with
>>> offsets, which tends to suggest that there may be some vector primitives
>>> that would be really good to have, that could be added to this:
>>>
>>> https://libre-riscv.org/simple_v_extension/specification/mv.x/
>>>
>>>
>>> a quick search "matrix inverse swizzle" shows this:
>>>
>>> https://lxjk.github.io/2017/09/03/Fast-4x4-Matrix-Inverse-with-SSE-SIMD-Explained.html
>>>
>>> which mentions __mm_shuffle_epl32, _mm_shuffle_ps, _mm_movelp_ps and
>>> that leads on a merry search to SSE/AVX/AVX-512:
>>>
>>> https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=_mm_shuffle_ps
>>> https://www.cs.uaf.edu/2006/fall/cs301/lecture/11_17_sse.html
>>>
>>> looovely :)
>>>
>>> at which point i am definitely lost.  does anyone have any suggestions?
>>>
>>
>> the algorithms I've generally used are an unrolled form of gauss jordan
>> elimination or just using the formula from a symbolic math program [1], at
>> which point, like operations can be grouped together.
>>
>> [1]:
>> type the following into maxima:
>> m:apply(matrix, makelist(makelist(concat(v, i, j), j, 0, 3), i, 0, 3));
>> grind(invert(m))$
>>
>> I've not generally had matrix inverse on a fast path, instead passing a
>> matrix and it's inverse together if needed and only calculating the inverse
>> when I generate the matrix.
>>
>> For most 3D programs, matrices are used much more than they are
>> generated, so matrix inverse shouldn't generally be in the fast path, if
>> the program is designed well.
>>
>> Jacob
>>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+unsubscribe at groups.riscv.org.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/9b038581-f4b0-499b-b948-abe6c77b49aa%40groups.riscv.org
> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/9b038581-f4b0-499b-b948-abe6c77b49aa%40groups.riscv.org?utm_medium=email&utm_source=footer>
> .
>