[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided
bugzilla-daemon at libre-riscv.org
bugzilla-daemon at libre-riscv.org
Sun Oct 6 19:19:33 BST 2019
http://bugs.libre-riscv.org/show_bug.cgi?id=139
--- Comment #45 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #44)
> (In reply to Jacob Lifshay from comment #43)
> .
> > >
> > > predication can be pseudo-added by:
> > >
> > > if (sel_field == 0b111) continue.
> >
> > if we're going to do that, we really should increase the field size to 4
> > bits per element, since shuffle2 already uses them all (rs1 x, y, z, and w
> > and rs3 x, y, z, and w)
>
> Yes was just thinking that. Then shuffle could keep 3 bits for consts and
> xyzw and use the 4th bit for predication
>
> >
> > though I am extremely disinclined to have something that sets the output
> > subvl in a data-dependent way (basically the output type & complete layout),
> > that seems like a giant mess of security vulnerabilities just waiting to
> > happen.
>
> Already sorted the algorithm was designed and implemented successfully in
> spike, for twin predication, last year (albeit for VL not SUBVL)
>
> It is shown in the appendix pseudocode as well. The src idx and dest idx are
> incremented independently and BOTH will result in loop termination on
> reaching SUBVL.
>
> > also, what do you do when subvector 1 has 2 ignores, subvector 2 has 3
> > ignores, subvector 3 has 1 ignore, and so on?!
>
> Stop the loop when either of the subindices reach SUBVL.
>
> If the programmer fails to insert enough ignores to not "represent"
> differing SUBVLs, that is their lookout. They should have read the manual :)
Do note that there isn't a counter on the src side, since swizzle allows random
access to all src elements in a subvector, whereas twin predication depends on
both src and dest elements being accessed in-order.
I think having it be "unchanged" would be a better name, since it isn't
actually that similar to twin-predication, it's basically only predicating the
write on each rd element.
We would still need a destsubvl field since srcsubvl is often a different
value, so that can't be used.
swizzlei would still need the 12-bit format due to not having enough immediate
bits. we can get away with only 3 i-type funct3s used for [f]swizzlei by having
one funct3 for destsubvl 1 through 3 for int and fp versions and a separate one
for destsubvl = 4 that's shared between int/fp:
+--------+-----------+----+-----------+----------+-------+-------+------+
| int/fp | DESTSUBVL | 31 | 30:29 | 28:20 | 19:15 | 14:12 | 11:7 |
+========+===========+====+===========+==========+=======+=======+======+
| int | 1 to 3 | 0 | DESTSUBVL | selector | rs | 000 | rd |
+--------+-----------+----+-----------+----------+-------+-------+------+
| fp | 1 to 3 | 1 | DESTSUBVL | selector | rs | 000 | rd |
+--------+-----------+----+-----------+----------+-------+-------+------+
| int | 4 | selector[11:0] | rs | 001 | rd |
+--------+-----------+---------------------------+-------+-------+------+
| fp | 4 | selector[11:0] | rs | 010 | rd |
+--------+-----------+---------------------------+-------+-------+------+
the rest could be encoded as follows:
+-----------+-------+-----------+-------+-------+-------+------+
| | 31:27 | 26:25 | 24:20 | 19:15 | 14:12 | 11:7 |
+===========+=======+===========+=======+=======+=======+======+
| swizzle2 | rs3 | DESTSUBVL | rs2 | rs1 | 100 | rd |
+-----------+-------+-----------+-------+-------+-------+------+
| swizzle | rs1 | DESTSUBVL | rs2 | rs1 | 100 | rd |
+-----------+-------+-----------+-------+-------+-------+------+
| fswizzle2 | rs3 | DESTSUBVL | rs2 | rs1 | 101 | rd |
+-----------+-------+-----------+-------+-------+-------+------+
| fswizzle | rs1 | DESTSUBVL | rs2 | rs1 | 101 | rd |
+-----------+-------+-----------+-------+-------+-------+------+
note how for [f]swizzle, rs3 == rs1
so it uses 5 funct3 values overall, which is appropriate, since swizzle is
probably right after muladd in usage in graphics shaders.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list