--- Comment #3 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Michael Nolan from comment #1)
> The POWER ISA includes instructions to load and store multiple registers
> to/from memory (`lmw` and `smw`). 
> `lmw rt, D(rA)` loads registers rt, r(t+1), r(t+2), ..., r31 from subsequent
> addresses starting at rA + signed(D). `smw` behaves similarly but for stores

ah funnily enough this is basically identical to LD.MULTI which is covered
by the out-of-order execution engine and by SimpleV.

> These instructions, if implemented, would significantly complicate the
> decoder by forcing it to either generate $31-n$ load or store ops while
> stalling instruction fetch. Alternatively it could generate a single op that
> after taking a trip through the pipeline would be modified (i.e.
> incrementing rt) and being placed back into the issue queue. I suspect the
> latter would not play nicely with the OOO machinery because it wouldn't
> place a reservation on a register until it was time to store that register,
> so subsequent instructions could modify the registers before they are stored.

the total opposite is true: it plays very nicely with the OoO engine and
plays merry hell with an in-order design.

no stalling is required.

all that happens is: we translate this into a SimpleV instruction
(VL=32) and hit the SV hardware-loop system with it.

