[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

Thu Oct 10 08:47:47 BST 2019

http://bugs.libre-riscv.org/show_bug.cgi?id=139

--- Comment #67 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #61)
> (In reply to Jacob Lifshay from comment #59)
> > > 
> > > i'd really like to know why other GPUs only have 8-bit swizzle,
> > > rather than having constants.  is setting from constants that common
> > > that they really *need* special treatment?
> > 
> > setting from constants is quite common, though a large part of the reason
> > why I picked 3 bits per element is to support [f]swizzle2.
> 
> that doesn't make any immediate/logical sense to me.

3 bits per element are needed in able to address all source elements for
[f]swizzle2:

000: rs1.x
001: rs1.y
010: rs1.z
011: rs1.w
100: rs3.x
101: rs3.y
110: rs3.z
111: rs3.w

Since we're already using 3 bits per element and since some swizzles in vulkan
allow swizzles like [x, y, 0.0, 1.0], why not use the same 3 bits per element
format for [f]swizzle[i] by using constants instead of rs3's elements?

It additionally provides a single instruction to load some commonly used FP
constants, which is useful.

> 
> > > 
> > > 
> > > SVP for 1 swizzle, opcode for the other idea
> > > ---------------------------------------
> > > 
> > > we definitely do not have room to fit 2 swizzles (16 bit) and it seems
> > > from Midgard that they're certainly needed.
> > 
> > I not so sure 2 swizzles per instruction are needed... seems excessive. I'm
> > guessing Midgard just did that because they had the room.
> 
> i'd like to work out why before we make any firm decisions.
> 
> > > VBLOCK covers the multi swizzle scenario fine, but SVP does not.
> > > if one swizzle immediate can be jammed into the SVP64 prefix,
> > > the other can be in the operation.  if the SVP64 swizzle prefix is not
> > > used, the "rules" state that the same swizzle (in the opcode)
> > > applies to *both* operands.
> > 
> > I still think swizzles should be separate operations and macro-op fusion can
> > be used if merging them with an ALU op is needed.
> 
> VBLOCK has room reserved for swizzles, so they can be applied to
> operations.  the entire principle on which VBLOCK is based is to use
> VBLOCK to "compactify" SVP encodings which would otherwise take up
> significantly more space.

I still think having separate swizzle opcodes is the way to go, macro-op fusion
will definitely work for combining swizzles with near-by operations.
Additionally, this allows many swizzles in a single VBLOCK without the need to
restart it in order to specify the swizzle again.

> 
> thus - one very important thing - concepts added to SVP *must* be
> mappable (without loss of functionality) *to* VBLOCK in an easily
> translatable fashion.  no exceptions.

I would actually argue that we should do it the other way around: all
operations possible using VBLOCK must be able to be translated into an
equivalent sequence using SVP and/or normal instructions.

That way, similar to RVC, the compiler can generate SVP instructions and then
the assembler/final-instruction-selection can combine them into VBLOCK where
practical in order to save space.

Note that this doesn't exclude SVP being completely mappable to VBLOCK, it just
changes the canonical instruction form to SVP, which is honestly less
complicated to parse and generate, due to not needing to group the
instructions.

> 
> here however we're looking at adding OP32 *instructions*, so it is new
> territory.
> 
> thinking about it a little: i'd really like it to be possible to replace
> (translate) these [new] opcodes with a VBLOCK applied to a straight C.MV.
> if that's even practical.  and if it actually saves space.
> 
> usually it does (for groups of larger than 3 SVP instructions).

Swizzles require enough immediate data that they won't really fit in 16 bits
anyway, using a new VBLOCK prefix just shifts the extra bits to another place
rather than reducing it, just making everything more complicated.

swizzles can be done by allocating new 32-bit instructions (which both VBLOCK
and SVP support).

reusing JAL (and other similar opcodes) works when inside SVP or VBLOCK, due to
JAL otherwise being unused/invalid in that context. This gives us much more
space to work with.

> 
> argh this is complicated! no wonder nobody in software/hardware libre
> has designed a GPU before! :)

Well, there have been a few designs... :P

-- 
You are receiving this mail because:
You are on the CC list for the bug.