[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Sat Sep 14 02:47:35 BST 2019

Mitch AlsupMitchAlsup at aol.com

-----Original Message-----
From: Jacob Lifshay <programmerjake at gmail.com>
To: Luke Kenneth Casson Leighton <luke.leighton at gmail.com>
Cc: RISC-V ISA Dev <isa-dev at groups.riscv.org>; Mitchalsup <mitchalsup at aol.com>; allen.baum <allen.baum at esperantotech.com>; libre-riscv-dev <libre-riscv-dev at lists.libre-riscv.org>
Sent: Fri, Sep 13, 2019 8:33 pm
Subject: Re: [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

On Fri, Sep 13, 2019, 18:06 lkcl <luke.leighton at gmail.com> wrote:

On Saturday, September 14, 2019 at 4:56:07 AM UTC+8, Jacob Lifshay wrote:
> Some notes:
> 
> 
> I think it may be worthwhile to have separate Ztrans extension names to indicate the levels of accuracy that are implemented, allowing a low-precision implementation of all instructions outside of F and D while having full-precision implementations of F and D for code compatibility.

Hum, hum, don't know. My concern: that would be an NxM table of extension names. There are around 8 Ztrans ectensions, times four (so far, just found that OpenCL is different from Vulkan so that's 5) which would be 40 potential different extension names, rather than N+M which would be 12-13.

Assuming higher-precision operations are allowed to be used to implement lower-precision modes, we could have the higher-precision extensions just imply the lower precision extensions.alternatively, we could introduce the concept of extension parameters (like C++ template parameters).
While I have no vote in the matter--I would implore you not to shoot yourself in the foot this way.

Are any of the permutations "extremely unlikely to be implemented" or "forbiddable"?

not that I can recall

> 
> Note that Vulkan requires full ieee754 precision for all F/D instructions except for fdiv and fsqrt.

Started investigating, found this
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_Env.html#relative-error-as-ulps

which is not Vulkan it's OpenCL, which is *different* from vulkan (sigh) :)

Also this:
https://www.khronos.org/registry/vulkan/specs/1.0/html/vkspec.html#spirvenv-precision-operation

I think that Vulkan and OpenCL are similar enough (and usually both implemented) that we could just merge the accuracy requirements into a Vulkan/OpenCL (as well as OpenGL/OpenGL ES) variant, for each operation we would just require the accuracy to meet both specs.

> 
> fdiv and fsqrt are easy enough to implement in full precision using the iterative shift-add/shift-sub algorithms that take up similar space to a few adders and shift registers and can be shared with the integer divider that I think it may be better to just require full precision mode for F/D - there can be a separate slow iterative div/sqrt unit if faster low-precision fdiv/fsqrt are wanted in the main ALUs. the iterative div/sqrt HW would take up much less space than even a multiplier (unless multiplication is also iterative, in which case it can also share HW with the div/sqrt unit).

Ok so for a hybrid design, where compliance with both IEEE754 and Vulkan or OpenCL is required, you are suggesting to do a pipelined (fast, large area) OpenCL/Vulkan ALU, with reduced accuracy, and for IEEE754 have a blocking Finite State Machine unit which eventually produces the correctly rounded answer?

Not quite, i'm saying that the base F/D specs should always support the fully accurate fdiv/fsqrt (but not necessarily stuff like fsinpi and fatanh) operations even if they're slow, suggesting a implementation strategy that doesn't require much hardware, can be shared with the integer divider, and works even on ultra-low-power devices that use an iterative multiplier. The reason being for compatibility with RVG software, though that may not be necessary for deeply embedded systems.

The logical reasoning being (recalling some discussions we had a few months back), that for "good" 3D you absolutely cannot have blocking computations which do not complete in a guaranteed timeframe.

Whereas for standard UNIX workloads that is extremely unlikely to matter.

An augmentation of this idea would be to use NR or other iterative algorithm as a microcode final phase based on the output from the less accurate pipeline.

> 
> I would expect there to be a fast HW multiplier even on micropower gpus because a large proportion of the operations need multiplication so you could get a overall several hundred percent speedup over an iterative multiplier.

Indeed.

> 
> 
> 
> 
> OpenCL's accuracy requirements are similar to Vulkan's -- full precision for neg/abs/add/sub/mul/muladd and reduced requirements for everything else.

Looks like a table is needed on the fpacc page. And OpenCL added as its own fpacc table entry.

I think OpenCL and Vulkan should share accuracy requirements -- see above.
Jacob-- 
You received this message because you are subscribed to a topic in the Google Groups "RISC-V ISA Dev" group.
To unsubscribe from this topic, visit https://groups.google.com/a/groups.riscv.org/d/topic/isa-dev/8knne5BtlvM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to isa-dev+unsubscribe at groups.riscv.org.
To view this discussion on the web visit https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/CAC2bXD7GtUcZO%3D45aKUkyaXRrBnp8EurSbo_XGiDbvPrJgUJNw%40mail.gmail.com.