[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Sun Oct 6 19:19:33 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=139

--- Comment #45 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #44)
> (In reply to Jacob Lifshay from comment #43)
> .
> > > 
> > > predication can be pseudo-added by:
> > > 
> > > if (sel_field == 0b111) continue.
> > 
> > if we're going to do that, we really should increase the field size to 4
> > bits per element, since shuffle2 already uses them all (rs1 x, y, z, and w
> > and rs3 x, y, z, and w)
> 
> Yes was just thinking that. Then shuffle could keep 3 bits for consts and
> xyzw and use the 4th bit for predication
> 
> > 
> > though I am extremely disinclined to have something that sets the output
> > subvl in a data-dependent way (basically the output type & complete layout),
> > that seems like a giant mess of security vulnerabilities just waiting to
> > happen.
> 
> Already sorted the algorithm was designed and implemented successfully in
> spike, for twin predication, last year (albeit for VL not SUBVL)
> 
> It is shown in the appendix pseudocode as well. The src idx and dest idx are
> incremented independently and BOTH will result in loop termination on
> reaching SUBVL.
> 
> > also, what do you do when subvector 1 has 2 ignores, subvector 2 has 3
> > ignores, subvector 3 has 1 ignore, and so on?!
> 
> Stop the loop when either of the subindices reach SUBVL.
> 
> If the programmer fails to insert enough ignores to not "represent"
> differing SUBVLs, that is their lookout. They should have read the manual :)

Do note that there isn't a counter on the src side, since swizzle allows random
access to all src elements in a subvector, whereas twin predication depends on
both src and dest elements being accessed in-order.

I think having it be "unchanged" would be a better name, since it isn't
actually that similar to twin-predication, it's basically only predicating the
write on each rd element.

We would still need a destsubvl field since srcsubvl is often a different
value, so that can't be used.

swizzlei would still need the 12-bit format due to not having enough immediate
bits. we can get away with only 3 i-type funct3s used for [f]swizzlei by having
one funct3 for destsubvl 1 through 3 for int and fp versions and a separate one
for destsubvl = 4 that's shared between int/fp:

+--------+-----------+----+-----------+----------+-------+-------+------+
| int/fp | DESTSUBVL | 31 | 30:29     | 28:20    | 19:15 | 14:12 | 11:7 |
+========+===========+====+===========+==========+=======+=======+======+
| int    | 1 to 3    | 0  | DESTSUBVL | selector | rs    | 000   | rd   |
+--------+-----------+----+-----------+----------+-------+-------+------+
| fp     | 1 to 3    | 1  | DESTSUBVL | selector | rs    | 000   | rd   |
+--------+-----------+----+-----------+----------+-------+-------+------+
| int    | 4         | selector[11:0]            | rs    | 001   | rd   |
+--------+-----------+---------------------------+-------+-------+------+
| fp     | 4         | selector[11:0]            | rs    | 010   | rd   |
+--------+-----------+---------------------------+-------+-------+------+

the rest could be encoded as follows:

+-----------+-------+-----------+-------+-------+-------+------+
|           | 31:27 | 26:25     | 24:20 | 19:15 | 14:12 | 11:7 |
+===========+=======+===========+=======+=======+=======+======+
| swizzle2  | rs3   | DESTSUBVL | rs2   | rs1   | 100   | rd   |
+-----------+-------+-----------+-------+-------+-------+------+
| swizzle   | rs1   | DESTSUBVL | rs2   | rs1   | 100   | rd   |
+-----------+-------+-----------+-------+-------+-------+------+
| fswizzle2 | rs3   | DESTSUBVL | rs2   | rs1   | 101   | rd   |
+-----------+-------+-----------+-------+-------+-------+------+
| fswizzle  | rs1   | DESTSUBVL | rs2   | rs1   | 101   | rd   |
+-----------+-------+-----------+-------+-------+-------+------+

note how for [f]swizzle, rs3 == rs1

so it uses 5 funct3 values overall, which is appropriate, since swizzle is
probably right after muladd in usage in graphics shaders.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list