# [libre-riscv-dev] Instruction sorta-prefixes for easier high-register access

Jacob Lifshay programmerjake at gmail.com
Thu Jan 31 06:08:45 GMT 2019

```On Wed, Jan 30, 2019, 20:34 Luke Kenneth Casson Leighton <lkcl at lkcl.net
wrote:

> ok so i thought about the vlpN concept, and if the register-prefixing
> encodes scalar/vector already, then reserving '0b00 for "scalar" is
> redundant.  it would therefore be better to split out VL from
> predicate specs.
>
I disagree, having a combined field allows using the otherwise-reserved
predicate of "never" to encode other less common VL-multipliers.

predicate:
000: x0 (never)
001: pr1
010: pr2
011: pr3
100: ~x0 (always)
101: ~pr1
110: ~pr2
111: ~pr3

>
> there is however a small problem with VL multipliers: they break the
> Vectorisation Loop paradigm, turning it effectively into a SIMD-like

Not really, see following examples.

>
> i am slightly concerned that the templates for VL-based loops would
> need to be much more complex (less uniform), as the multipliers now
> need to be taken into account within the loop, on a per-instruction
> basis instead of a per-loop basis.
>
VL multipliers are basically embedding the short-length (1 to 4) SIMD
vectors used in Vulkan shaders into a VL-based vectorization loop.

With standard VL-based vectorization, the loop:
float a[], b[], c[], d[];
for(int i = 0; i < 1000; i++)
{
a[i] = b[i] + c[i] * d[i];
}
vectorization produces:
for(int i = 0;;)
{
VL = setvl(1000 - i);
storeVL(&a[i], v);
i += VL;
}

With vl-multipliers, we can similarly vectorize the loop:
struct VertexIn
{
vec3 position;
vec3 normal;
vec4 color; // rgba
};
struct VertexOut
{
vec4 position; // xyzw
vec4 color;
};
VertexIn vertexes_in[];
VertexOut vertexes_out[];
vec3 light_dir;
float ambient, diffuse;
for(int i = 0; i < 1000; i++)
{
// calculate vertex colors using
// lambert's cos model and fixed ambient brightness
vec3 n = vertexes_in[i].normal;
vec3 l = light_dir;
float dot = n.x * l.x + n.y * l.y + n.z * l.z;
float brightness = max(dot, 0.0) * diffuse + ambient;
vec4 c = vertexes_in[i].color;
c.rgb *= brightness;
vertexes_out[i].color = c;
// orthographic projection
vertexes_out[i].position = vec4(vertexes_in[i].position, 1.0);
}

vectorization produces:
for(int i = 0;;)
{
VL = setvl(1000 - i);
vec3 l = light_dir;
vecVL dot = n.x * l.x + n.y * l.y + n.z * l.z;
vecVL brightness = max(dot, 0.0) * diffuse + ambient;
vec3xVL c_rgb = c.rgb;
c_rgb *= brightness;
c.rgb = c_rgb;
store4xVL_strided(&vertexes_out[i].color, c, sizeof(VertexOut));
vec4xVL p = 1.0;
store4xVL_strided(&vertexes_out[i].position, p, sizeof(VertexOut));
i += VL;
}

The vl-multipliers that Vulkan needs are 1, 2, 3, and 4.
The vl-multipliers that OpenCL needs are 1, 2, 3, 4, 8, and 16, though we
can probably get away with just 1 to 4 and use multiple vectors for 8 and
16.

So, hopefully, the examples I gave help show how vl-multipliers are
probably the most straightforward way to vectorize graphics code with
variable length vectors.

Jacob
```