[libre-riscv-dev] Some general instruction ideas

whygee at f-cpu.org whygee at f-cpu.org
Fri Jan 24 15:03:12 GMT 2020


On 2020-01-24 14:44, Lauri Kasanen wrote:
> Hi,
> 
> I wanted to float some general instruction ideas, now that things seem
> to be picking up. I've mentioned some to Luke previously. These have
> use in video, but also in general.
> 
> - altivec's vec_perm. It's a byte shuffle with three input regs and one
> output. It's exceedingly useful, more powerful than any of x86's
> shuffles, and I believe it should be copied as-is.

very powerful indeed !

> - saturated versions of add/sub/mul/narrow/etc. Saves those manual
> checks. Beyond video, often used in image processing.

and sound processing, and many other fields...

> - memcpy. I remember a Linus quote on what instructions he'd like to
> see, and he said memcpy and memset. I know it's not very RISC, but it's
> highly useful, and a hardware loop is always faster than sw.

I'm not sure...
is it something like the STOS(B|W|D|Q) / LODS(B|W|D|Q)  instructions of 
x86 ?
so basically a store or load with index inc/decrement in parallel ?

It IS possible in RISC because "canonical MIPS" performs the ADD update
in series with the load/store. Since F-CPU, my architectures work like 
that
as well : you have to have the destination/source memory address in a 
register
and the instruction computes the address for the NEXT access in 
/parallel/.

So I'd say yay.

> One cpu I'm familiar with, 65816, has two such memcpy instructions
> (forward and backward, byte units). They have no hidden state, and are
> interruptable between every byte. By giving them overlapping areas,
> they work as memset, with an arbitrary-sized source pattern, not just
> limited to a byte or int.
> 
> Such would have immediate use in decompression, and widely in general.

compression/decompression (and others) need specific instruction for
bitstream insertion/extraction, which is quite a delicate subject
(I'm preparing/writing an article about this right now for Linux 
Magazing France)

> It would also give a clear answer for "what is the fastest memcpy", 
> heh.

it will always be optimised microcoded versions, at least for medium to 
large blocks.

> - Lauri
yg



More information about the libre-riscv-dev mailing list