[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Sat Sep 14 02:33:01 BST 2019

On Fri, Sep 13, 2019, 18:06 lkcl <luke.leighton at gmail.com> wrote:

> On Saturday, September 14, 2019 at 4:56:07 AM UTC+8, Jacob Lifshay wrote:
> > Some notes:
> >
> >
> > I think it may be worthwhile to have separate Ztrans extension names to
> indicate the levels of accuracy that are implemented, allowing a
> low-precision implementation of all instructions outside of F and D while
> having full-precision implementations of F and D for code compatibility.
>
> Hum, hum, don't know. My concern: that would be an NxM table of extension
> names. There are around 8 Ztrans ectensions, times four (so far, just found
> that OpenCL is different from Vulkan so that's 5) which would be 40
> potential different extension names, rather than N+M which would be 12-13.
>

Assuming higher-precision operations are allowed to be used to implement
lower-precision modes, we could have the higher-precision extensions just
imply the lower precision extensions.
alternatively, we could introduce the concept of extension parameters (like
C++ template parameters).

>
> Are any of the permutations "extremely unlikely to be implemented" or
> "forbiddable"?
>
not that I can recall

>
>
>
> >
> > Note that Vulkan requires full ieee754 precision for all F/D
> instructions except for fdiv and fsqrt.
>
> Started investigating, found this
>
> https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_Env.html#relative-error-as-ulps
>
> which is not Vulkan it's OpenCL, which is *different* from vulkan (sigh) :)
>
> Also this:
>
> https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#spirvenv-precision-operation

I think that Vulkan and OpenCL are similar enough (and usually both
implemented) that we could just merge the accuracy requirements into a
Vulkan/OpenCL (as well as OpenGL/OpenGL ES) variant, for each operation we
would just require the accuracy to meet both specs.

>
>
> >
> > fdiv and fsqrt are easy enough to implement in full precision using the
> iterative shift-add/shift-sub algorithms that take up similar space to a
> few adders and shift registers and can be shared with the integer divider
> that I think it may be better to just require full precision mode for F/D -
> there can be a separate slow iterative div/sqrt unit if faster
> low-precision fdiv/fsqrt are wanted in the main ALUs. the iterative
> div/sqrt HW would take up much less space than even a multiplier (unless
> multiplication is also iterative, in which case it can also share HW with
> the div/sqrt unit).
>
> Ok so for a hybrid design, where compliance with both IEEE754 and Vulkan
> or OpenCL is required, you are suggesting to do a pipelined (fast, large
> area) OpenCL/Vulkan ALU, with reduced accuracy, and for IEEE754 have a
> blocking Finite State Machine unit which eventually produces the correctly
> rounded answer?
>
Not quite, i'm saying that the base F/D specs should always support the
fully accurate fdiv/fsqrt (but not necessarily stuff like fsinpi and
fatanh) operations even if they're slow, suggesting a implementation
strategy that doesn't require much hardware, can be shared with the integer
divider, and works even on ultra-low-power devices that use an iterative
multiplier. The reason being for compatibility with RVG software, though
that may not be necessary for deeply embedded systems.

>
> The logical reasoning being (recalling some discussions we had a few
> months back), that for "good" 3D you absolutely cannot have blocking
> computations which do not complete in a guaranteed timeframe.

> Whereas for standard UNIX workloads that is extremely unlikely to matter.
>
> An augmentation of this idea would be to use NR or other iterative
> algorithm as a microcode final phase based on the output from the less
> accurate pipeline.
>
> >
> > I would expect there to be a fast HW multiplier even on micropower gpus
> because a large proportion of the operations need multiplication so you
> could get a overall several hundred percent speedup over an iterative
> multiplier.
>
> Indeed.
>
> >
> >
> >
> >
> > OpenCL's accuracy requirements are similar to Vulkan's -- full precision
> for neg/abs/add/sub/mul/muladd and reduced requirements for everything else.
>
> Looks like a table is needed on the fpacc page. And OpenCL added as its
> own fpacc table entry.
>
I think OpenCL and Vulkan should share accuracy requirements -- see above.

Jacob