[libre-riscv-dev] Some general instruction ideas

Jacob Lifshay programmerjake at gmail.com
Fri Jan 24 15:05:48 GMT 2020


On Fri, Jan 24, 2020, 05:44 Lauri Kasanen <cand at gmx.com> wrote:

> Hi,
>
> I wanted to float some general instruction ideas, now that things seem
> to be picking up. I've mentioned some to Luke previously. These have
> use in video, but also in general.
>
> - altivec's vec_perm. It's a byte shuffle with three input regs and one
> output. It's exceedingly useful, more powerful than any of x86's
> shuffles, and I believe it should be copied as-is.
>

I had proposed an even more general version that works on 8, 16, 32, and
64-bit elements (rather than just 8-bit), and can produce 1-4 elements in a
subvector, able to process a variable number of subvectors per instruction
rather than being limited to 128-bit SIMD:

see swizzle2 pseudocode at
https://libre-riscv.org/simple_v_extension/specification/mv.x/

If that's not sufficient, we also have mv.x, which is a register-indirect
vectorized register-to-register move.


> - saturated versions of add/sub/mul/narrow/etc. Saves those manual
> checks. Beyond video, often used in image processing.
>

Sounds like a good idea. We will probably also need those for the
Graphics-side.

>
> - memcpy. I remember a Linus quote on what instructions he'd like to
> see, and he said memcpy and memset. I know it's not very RISC, but it's
> highly useful, and a hardware loop is always faster than sw.
>

We'll just have to avoid x86's mistake: a sw loop being much faster than
their memcpy instruction (rep movsb) on some processors.

Dedicated memcpy (actually memmove) and memset instructions can also
increase speed by not needing to load the contents of totally-overwritten
cache blocks from memory on a cache miss.

We should support storing larger patterns (64-bits or larger) rather than
being limited to setting all bytes the same.

Jacob


More information about the libre-riscv-dev mailing list