[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Wed Oct 9 10:19:19 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=139

--- Comment #59 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #56)
> SUBVL-predication mode idea
> ----------------------
> 
> one possible solution here is to (somehow) jam in a mode which says
> "hey, you know we said SUBVL doesn't have predication, and that the
> predicate bit applies to *all* of SUBVL? well, um, for this operation,
> it does".
> 
> what that would do is limit VL to a maximum of around 16, but i'm fine
> with that.

That would require calculating the expanded predicate a lot, since most
performance-critical instructions will be predicated.

Not sure if the extra instructions are worth it.

Also, if we ever want to make a high-performance GPU, VL being limited to 16
would be a major drawback (AMDGPU uses 32 (only on Navi) or 64, I think NVIDIA
uses 32 though could be wrong)

> 
> 8-bit-swizzle idea
> -------------
> 
> going back to 8-bit on swizzle rather than 12-bit, the remaining 4 bits
> can be used as a predicate SUBVL immediate DESTMASK.
> 
> that solves the issue of whether it's run-time safe (in the immediate
> case).
> 
> also, with DESTSUBVL being redundant (directly equivalent *to* DESTMASK)
> that gives two bits back [useable for other opcodes]

actually only 1, since SUBVL <= 3 have SUBVL in what would be the upper bits of
the immediate

> 
> i'd really like to know why other GPUs only have 8-bit swizzle,
> rather than having constants.  is setting from constants that common
> that they really *need* special treatment?

setting from constants is quite common, though a large part of the reason why I
picked 3 bits per element is to support [f]swizzle2.

> 
> 
> SVP for 1 swizzle, opcode for the other idea
> ---------------------------------------
> 
> we definitely do not have room to fit 2 swizzles (16 bit) and it seems
> from Midgard that they're certainly needed.

I not so sure 2 swizzles per instruction are needed... seems excessive. I'm
guessing Midgard just did that because they had the room.

> VBLOCK covers the multi swizzle scenario fine, but SVP does not.
> if one swizzle immediate can be jammed into the SVP64 prefix,
> the other can be in the operation.  if the SVP64 swizzle prefix is not
> used, the "rules" state that the same swizzle (in the opcode)
> applies to *both* operands.

I still think swizzles should be separate operations and macro-op fusion can be
used if merging them with an ALU op is needed.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list