[libre-riscv-dev] sv 1d/2d/3d data shaping

Tue Oct 16 00:11:32 BST 2018

On Mon, Oct 15, 2018 at 10:46 PM lkcl <lkcl at libre-riscv.org> wrote:
>
> On Mon, Oct 15, 2018 at 10:17 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
> >
> > That should work. For a 4x4 matrix (the biggest required) it could be set
> > up to have a 4x4xN layout, where N is the number of shaders running at once.
>
>  okaay very cool.  i'll write something up.

 this flexible algorithm (below) seems to do the job.  hypothetically
it can be extended to N dimensions.  the "order" may be specified as a
permutation, which, for a set of 3 elements is six combinations, for a
total of only 3 bits to express it:
012 021
120 102
201 210

with 32 bits to play with, and a hard limit of XLEN, the dimensions
may be expressed in 6 bits, as (x-field + 1), (y-field + 1), (z-field
+ 1) and of course any of those equals zero, that means that dimension
equals 1 (linear) so is effectively disabled, and in this way we get
2D (and 1D obviously).

interestingly if the xdim * ydim * zdim is *less* then VL, then the
whole thing wraps round.  i can see that actually being extremely
useful, to apply values in a matrix repeatedly to an *array* of
matrices.  or, to have a single instruction issue multiple adds of a
larger array to a smaller one, in a cumulative fashion.  map-reduce in
other words.

obviously have to be be really really careful there, because if the
wrap is too small it could interfere with pipeline issuing / register
allocation...

still, extremely cool.

----

xdim = 3
ydim = 4
zdim = 5

xmul = xdim
ymul = ydim * xdim
zmul = 1

lims = [xdim, ydim, zdim]
idxs = [0,0,0]
order = [1,2,0]

for idx in range(xdim * ydim * zdim):
    new_idx = idxs[0] + idxs[1] * xdim + idxs[2] * xdim * ydim
    print new_idx,
    for i in range(3):
        idxs[order[i]] = idxs[order[i]] + 1
        if (idxs[order[i]] != lims[order[i]]):
            break
        print
        idxs[order[i]] = 0