[libre-riscv-dev] Instruction sorta-prefixes for easier high-register access
Jacob Lifshay
programmerjake at gmail.com
Tue Jan 29 05:08:06 GMT 2019
On Mon, Jan 28, 2019, 20:34 Luke Kenneth Casson Leighton <lkcl at lkcl.net
wrote:
> On Sun, Jan 27, 2019 at 11:36 PM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > Note for Luke: this has old stuff, so don't skip over
>
meant to say new.
> ok.
>
> > A table of the scalar/vector encodings:
> > 1 register, integer:
> > 1-bit field
> > v: 0
> > s: 1
>
> if we have implicit scalar-vector from the register numbering-prefix,
> a separate field isn;t needed.
>
what I had meant is that we would have a scalar/vector indicator and the
base 5-bit register number field and when the indicator is vector then we
would convert the 5-bit field abcde to deabc00 and when the indicator is
scalar it would be converted to 00abcde. This is equivalent to the
scalar/vector-mod-4 scheme we previously discussed except that we only need
2 bits of scalar/vector for 3-register integer ops, saving the bit needed
for elwidth. we don't need to worry about 4-register integer ops since
there aren't any and fp ops have an elwidth field in the underlying op so
we don't need one in the prefix, leaving enough bits in the prefix for
4-register fp ops (fmadd).
>
> reduce operations i decided in the original SV to not include, as it
> creates dependencies that i felt would be better expressed as straight
> loops. instead, the for-loop for the "hardware-macro-unrolling" would
> simply terminate after the first element operation successfully
> completed, taking predication into account in that.
>
> so VEXTRACT and VINSERT just become accidentally-implemented
> side-effects of the loop termination.
I really think we should add reduce operations because they are really
handy in matrix multiplication, which is used in both neural nets and 3D
graphics.
a reducing version of fmadd is basically a column or row of a vl-mul by VL
matrix multiply operation with one of the input matrices transposed (aka a
vector of dot-products).
We don't need to specify a fixed order for reduction for SV's spec, it just
needs to be deterministic, depending only on the specific operation, VL,
and vl-mul. This allows us to operate on 4 elements at a time for most of
the reduction. Order is irrelevant anyway for integer reductions.
vextract and vinsert ops are the scalar version of the strided register to
register move (basically strided ld/st except on in-register vectors
instead of in-memory vectors) operations that I recommended adding earlier,
with the added benefit of not needing to build a predicate to use it.
> l.
>
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>
More information about the libre-riscv-dev
mailing list