[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided
bugzilla-daemon at libre-riscv.org
bugzilla-daemon at libre-riscv.org
Fri Oct 4 21:09:27 BST 2019
http://bugs.libre-riscv.org/show_bug.cgi?id=139
--- Comment #25 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #24)
> (In reply to Jacob Lifshay from comment #23)
>
> > > more in a mo - twin-SUBVL doesn't sound right. SUBVL is intended to
> > > be applied globally. the CSRs would need a total redesign to cope.
> >
> > I had always intended SUBVL to vary from value to value, even faster than VL
> > would vary.
>
> Varying is not a problem at all. Having *two* SUBVLs (or even worse three),
> one for src1, one for src2 and another for rd, we're into major redesign
> territory.
>
> The example there, all of the src and dest vectors are all vec4 ie SUBVL=4.
>
> However what is missing is a per SUBVL-element predicate mask, and if you
> recall we specifically designed the SUBVL predication to apply the predicate
> bit to the whole group.
>
> Without going into redesigns, the solution would be to ensure a full vector
> is copied.
>
> In the example given, it turns out that the first two parts of the colour
> come from line 34, and the last two from line 35.
>
> If that is not done, then by way of various passes I would expect that the
> elements be copied by non-SUBVL methods (using VL and predicate masking)
> followed by a swizzle copy that placed the one, two, or three unaltered
> elements into the dest.
>
> OR...
>
> This is perhaps what "identity" is for.
In the vulkan api, identity is syntatic sugar for x, y, z, or w, matching the
element written to.
>
> If identity is intended to mean that the indexed subelement is unaltered, we
> have a way to leave xy alone:
even if we has an instruction like that, we would still need different
destsubvl and srcsubvl for the swizzle on line 34, since EncodeNormal returns
vec2 and it writes to o_normal_color which is a vec4.
>
> Col.zw = srccol.xy
>
> Becomes
>
> Swizzle Col, srccol, {identity, identity, x, y}
>
> Meaning:
>
> * leave col.x untouched
> * leave col.y untouched
> * set col.z to srccol.x
> * set col.w to srccol.y
>
> A separate pass would notice the identity overlap with line 34 and combine
> them, but that is a different story.
>
> Bottom line, think it through from the normal SIMD perspective that other
> GPUs use, they just simply don't have the capability to do mixed
> vec2,vec3,vec4 operations, they are all vec2 only, vec3 only or vec4 only.
AMDGPU just converts everything to individual 32-bit units and does operations
element by element, which is an alternate path to needing SUBVL, though I would
be worried about increased instruction count and ALU op packing inefficiencies,
because larger shaders are likely to have very low VL values (less than 4).
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list