[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Sun Oct 6 23:24:56 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=139

--- Comment #47 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #45)
> (In reply to Luke Kenneth Casson Leighton from comment #44)
> > (In reply to Jacob Lifshay from comment #43)
> > .
> > > > 
> > > > predication can be pseudo-added by:
> > > > 
> > > > if (sel_field == 0b111) continue.
> > > 
> > > if we're going to do that, we really should increase the field size to 4
> > > bits per element, since shuffle2 already uses them all (rs1 x, y, z, and w
> > > and rs3 x, y, z, and w)
> > 
> > Yes was just thinking that. Then shuffle could keep 3 bits for consts and
> > xyzw and use the 4th bit for predication
> > 
> > > 
> > > though I am extremely disinclined to have something that sets the output
> > > subvl in a data-dependent way (basically the output type & complete layout),
> > > that seems like a giant mess of security vulnerabilities just waiting to
> > > happen.
> > 
> > Already sorted the algorithm was designed and implemented successfully in
> > spike, for twin predication, last year (albeit for VL not SUBVL)
> > 
> > It is shown in the appendix pseudocode as well. The src idx and dest idx are
> > incremented independently and BOTH will result in loop termination on
> > reaching SUBVL.
> > 
> > > also, what do you do when subvector 1 has 2 ignores, subvector 2 has 3
> > > ignores, subvector 3 has 1 ignore, and so on?!
> > 
> > Stop the loop when either of the subindices reach SUBVL.
> > 
> > If the programmer fails to insert enough ignores to not "represent"
> > differing SUBVLs, that is their lookout. They should have read the manual :)
> 
> Do note that there isn't a counter on the src side, since swizzle allows
> random access to all src elements in a subvector, whereas twin predication
> depends on both src and dest elements being accessed in-order.
> 

I need to write out the pseudocode to explain it. The random accessing comes
*after* the inorder selection (including advancing the destcounter over
"unchanged" dest items).

It's a little weird and obtuse.


> I think having it be "unchanged" would be a better name, since it isn't
> actually that similar to twin-predication, it's basically only predicating
> the write on each rd element.

Both predication on src and predication-on-dest-before-the-selection are
needed.

For swizzle, one bit (4th bit) can be one of those, the other can be 0b111 in
the other 3 bits.

For swizzle2 unfortunately and annoyingly if bit 3 is used as the rs1/rs3
selector we need *5* bits.

> 
> We would still need a destsubvl field since srcsubvl is often a different
> value, so that can't be used.

We'll work through why that isn't the case, in a different thread, either on or
off list.

> 
> swizzlei would still need the 12-bit format due to not having enough
> immediate bits.

Yes. Very annoying. Don't have a good answer for that yet.


> 
> so it uses 5 funct3 values overall, which is appropriate, since swizzle is
> probably right after muladd in usage in graphics shaders.

These are 4 op, take up 50 to 80% of a major opcode. That's really, *really*
high.

I wonder instead if we can fit some bits into VBLOCK or SVP64.

If these ops are that common and that important, trying to cram everything into
OP32 is just askibg for trouble.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list