[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Fri Oct 4 21:09:27 BST 2019


--- Comment #25 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #24)
> (In reply to Jacob Lifshay from comment #23)
> > > more in a mo - twin-SUBVL doesn't sound right.  SUBVL is intended to
> > > be applied globally.  the CSRs would need a total redesign to cope.
> > 
> > I had always intended SUBVL to vary from value to value, even faster than VL
> > would vary.
> Varying is not a problem at all. Having *two* SUBVLs (or even worse three),
> one for src1, one for src2 and another for rd, we're into major redesign
> territory.
> The example there, all of the src and dest vectors are all vec4 ie SUBVL=4.
> However what is missing is a per SUBVL-element predicate mask, and if you
> recall we specifically designed the SUBVL predication to apply the predicate
> bit to the whole group.
> Without going into redesigns, the solution would be to ensure a full vector
> is copied.
> In the example given, it turns out that the first two parts of the colour
> come from line 34, and the last two from line 35.
> If that is not done, then by way of various passes I would expect that the
> elements be copied by non-SUBVL methods (using VL and predicate masking)
> followed by a swizzle copy that placed the one, two, or three unaltered
> elements into the dest.
> OR...
> This is perhaps what "identity" is for.

In the vulkan api, identity is syntatic sugar for x, y, z, or w, matching the
element written to.

> If identity is intended to mean that the indexed subelement is unaltered, we
> have a way to leave xy alone:

even if we has an instruction like that, we would still need different
destsubvl and srcsubvl for the swizzle on line 34, since EncodeNormal returns
vec2 and it writes to o_normal_color which is a vec4.

> Col.zw = srccol.xy
> Becomes
> Swizzle Col, srccol, {identity, identity, x, y}
> Meaning:
> * leave col.x untouched
> * leave col.y untouched
> * set col.z to srccol.x
> * set col.w to srccol.y
> A separate pass would notice the identity  overlap with line 34 and combine
> them, but that is a different story.
> Bottom line, think it through from the normal SIMD perspective that other
> GPUs use, they just simply don't have the capability to do mixed
> vec2,vec3,vec4 operations, they are all vec2 only, vec3 only or vec4 only.

AMDGPU just converts everything to individual 32-bit units and does operations
element by element, which is an alternate path to needing SUBVL, though I would
be worried about increased instruction count and ALU op packing inefficiencies,
because larger shaders are likely to have very low VL values (less than 4).

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-riscv-dev mailing list