[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

Fri Oct 4 21:09:27 BST 2019

http://bugs.libre-riscv.org/show_bug.cgi?id=139

--- Comment #25 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #24)
> (In reply to Jacob Lifshay from comment #23)
> 
> > > more in a mo - twin-SUBVL doesn't sound right.  SUBVL is intended to
> > > be applied globally.  the CSRs would need a total redesign to cope.
> > 
> > I had always intended SUBVL to vary from value to value, even faster than VL
> > would vary.
> 
> Varying is not a problem at all. Having *two* SUBVLs (or even worse three),
> one for src1, one for src2 and another for rd, we're into major redesign
> territory.
> 
> The example there, all of the src and dest vectors are all vec4 ie SUBVL=4.
> 
> However what is missing is a per SUBVL-element predicate mask, and if you
> recall we specifically designed the SUBVL predication to apply the predicate
> bit to the whole group.
> 
> Without going into redesigns, the solution would be to ensure a full vector
> is copied.
> 
> In the example given, it turns out that the first two parts of the colour
> come from line 34, and the last two from line 35.
> 
> If that is not done, then by way of various passes I would expect that the
> elements be copied by non-SUBVL methods (using VL and predicate masking)
> followed by a swizzle copy that placed the one, two, or three unaltered
> elements into the dest.
> 
> OR...
> 
> This is perhaps what "identity" is for.

In the vulkan api, identity is syntatic sugar for x, y, z, or w, matching the
element written to.

> 
> If identity is intended to mean that the indexed subelement is unaltered, we
> have a way to leave xy alone:

even if we has an instruction like that, we would still need different
destsubvl and srcsubvl for the swizzle on line 34, since EncodeNormal returns
vec2 and it writes to o_normal_color which is a vec4.

> 
> Col.zw = srccol.xy
> 
> Becomes
> 
> Swizzle Col, srccol, {identity, identity, x, y}
> 
> Meaning:
> 
> * leave col.x untouched
> * leave col.y untouched
> * set col.z to srccol.x
> * set col.w to srccol.y
> 
> A separate pass would notice the identity  overlap with line 34 and combine
> them, but that is a different story.
> 
> Bottom line, think it through from the normal SIMD perspective that other
> GPUs use, they just simply don't have the capability to do mixed
> vec2,vec3,vec4 operations, they are all vec2 only, vec3 only or vec4 only.

AMDGPU just converts everything to individual 32-bit units and does operations
element by element, which is an alternate path to needing SUBVL, though I would
be worried about increased instruction count and ALU op packing inefficiencies,
because larger shaders are likely to have very low VL values (less than 4).

-- 
You are receiving this mail because:
You are on the CC list for the bug.