[libre-riscv-dev] Some general instruction ideas

Jacob Lifshay programmerjake at gmail.com
Fri Jan 24 16:32:54 GMT 2020


On Fri, Jan 24, 2020, 08:20 Immanuel, Yehowshua U <yimmanuel3 at gatech.edu>
wrote:

> > We'll just have to avoid x86's mistake…
>
> > Dedicated memcpy (actually memmove) and memset instructions can also
> > increase speed by not needing to load the contents of totally-overwritten
> > cache blocks from memory on a cache miss.
>
> Are you for or against memcpy?
>

not sure, but if we do it, we should do it right.

>
> > We should support storing larger patterns (64-bits or larger) rather than
> > being limited to setting all bytes the same.
>
> Are you saying that we should have RISC store instructions that can
> support storing 1 - 4 bytes?
>

no, we already have that. I'm saying that if we have memset, we should have
it be able to write bytes ABCDABCDABCD... (like x86's rep stosd) rather
than just AAAAAAAAA..., but including 8, 16, 32, and 64-bit patterns.

>
> I’m personally a little skeptical of memcpy… Intuition tells me that to
> gain justifiable speedups - we’d need
> to tightly integrate the instruction with hardware - and doing such might
> expose the attack surface for
> security exploits.
>
> That said - what’s the RISC way of Memcopy?
> probably:
>
>
> loop: load from src reg to temp reg
> store from temp reg to dest reg
> increment src reg
> increment dest reg
> jmp loop
>
> I’m thinking with at least a four stage pipeline with forwarding, at
> steady state - we’d be storing a byte every cycle?…
>

PowerISA can at least do 64-bits per cycle with unaligned load/store
(128-bits with SIMD), our hardware should be able to do at least 128-bits
per cycle assuming the cache has separate read/write ports, if we add in
special cache block handling, maybe a whole cache block per cycle if
suitably aligned.

Jacob


More information about the libre-riscv-dev mailing list