[libre-riscv-dev] fp special functions

Sun Aug 4 23:47:55 BST 2019

On Sun, Aug 4, 2019, 15:27 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> there's some other functions necessary for 3D, which, again, are on
> the slides: texturisation, pixel, and vector processing: dot-product
> and so on.
>
> "4xFP32 ARGB to 1xINT32 ARGB8888" in particular, a "trick" occurred to
> me last night that might allow us to use SimpleV in the FP opcode
> space.
>
> it's quite unusual, it goes like this:
>
> * FP opcode funct5=NNNNN in the "scalar" space (non-SV-mode) raises an
> ILLEGAL instruction.
>
> if however, the following special conditions are met:
>
> * SV's Sub-Vector Len is set to 4 AND
> * SV element width is set to 32-bit for the TARGET register AND
> * SV element width is set to 32-bit for the SOURCE register
>
> *THEN* that very same opcode is NO LONGER an illegal instruction, it
> is a "4xFP32 ARGB to 1xINT32 ARGB8888" instruction.
>
It should work just fine with the scalar version to be convert f32 to
normalized u8 with saturation. with the semantics that a 4xu8 vector is
packed in a 32-bit value, we should have no problems. We will need to use
this trick with sRGB conversions, however. note that dest element width
would be 8-bits in both cases. if the in-memory byte order is something
other than rgba, then we can just do a swizzle operation.

>
> there is also a Vulkan API function to do "4xFP32 ARGB to 1xINT16
> ARGB565".  the conditions for activating this may be:
>
> * SV's Sub-Vector Len is set to 4 AND
> * SV element width is set to 16-bit for the TARGET register AND
> * SV element width is set to 32-bit for the SOURCE register
>
> if there are any 3xFP32 ARGB instructions needed, it should be clear
> that the SV Sub-Vector length should be set to 3.
>
> it is an extremely unusual use of SV: normally the scalar opcodes
> exist independently of their "vectorised equivalents".  however it
> makes absolutely no sense in this case, because 1xFP32 ARGB to 1xINT32
> ARGB8888 is nonsense (impossible).
>
yeah, but 1x f32 to 1x norm-u8 is quite reasonable.

>
> i believe there may be many other candidate instruction opportunities
> that could benefit from this trick.  SLERP for example requires
> Quaternions, to specify the points, *and* a 3rd argument (t).
> https://en.wikipedia.org/wiki/Slerp

as far as I know, slerp doesn't require any operations that are specific to
quaternions, all the operations are done on scalars, are standard scalar
and 4d vector multiplications, or are 4d vector dot-products. therefore, I
don't think we need any special quaternion instructions to implement slerp.

Jacob