[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Sat Sep 14 03:10:48 BST 2019

On Saturday, September 14, 2019 at 9:33:16 AM UTC+8, Jacob Lifshay wrote:
> On Fri, Sep 13, 2019, 18:06 lkcl <luke.l... at gmail.com> wrote:
> On Saturday, September 14, 2019 at 4:56:07 AM UTC+8, Jacob Lifshay wrote:
> 
> > Some notes:
> 
> > 
> 
> > 
> 
> > I think it may be worthwhile to have separate Ztrans extension names to indicate the levels of accuracy that are implemented, allowing a low-precision implementation of all instructions outside of F and D while having full-precision implementations of F and D for code compatibility.
> 
> 
> 
> Hum, hum, don't know. My concern: that would be an NxM table of extension names. There are around 8 Ztrans ectensions, times four (so far, just found that OpenCL is different from Vulkan so that's 5) which would be 40 potential different extension names, rather than N+M which would be 12-13.
> 
> 
> 
> Assuming higher-precision operations are allowed to be used to implement lower-precision modes, we could have the higher-precision extensions just imply the lower precision extensions.

Which they do anyway.

Okay so this starts to make a little more sense (a little less mad).

Hum, hum.... so no implementor would ever have all 32 (40?) extensions, they would only have 8 max.

In addition, some implementors may choose full accuracy for some areas and reduced accuracy for others, *and be able to declare that*, via the naming.

Whereas for just the current scheme it is more all or nothing.  All ULP1 or all IEEE754.

> alternatively, we could introduce the concept of extension parameters (like C++ template parameters).

intriguing :)

> I think that Vulkan and OpenCL are similar enough (and usually both implemented) that we could just merge the accuracy requirements into a Vulkan/OpenCL (as well as OpenGL/OpenGL ES) variant, for each operation we would just require the accuracy to meet both specs.

Argh that means creating a side by side table of 30+ functions across 3 different documents, in order to determine the lowest ULP.

blech :)

I am torn between "being lazy (just have separate bits in FCSR for vulkan and opencl" and "how much sense that makes to merge them"

I *will* need help reviewing the source documents, firstly making sure the right ones are used.  

 https://libre-riscv.org/zfpacc_proposal/

I have added the links that I can find, there is one for opencl spirv 2.2 whuch is HTML but there is a pdf for opencl 2.1 no mention of spirv.  Need help tracking down which is the latest spec.

Then will need a thorough review as I have been known to get things wrong

> 
> 

> 
> 
> 
> Ok so for a hybrid design, where compliance with both IEEE754 and Vulkan or OpenCL is required, you are suggesting to do a pipelined (fast, large area) OpenCL/Vulkan ALU, with reduced accuracy, and for IEEE754 have a blocking Finite State Machine unit which eventually produces the correctly rounded answer?
> 
> Not quite, i'm saying that the base F/D specs should always support the fully accurate fdiv/fsqrt (but not necessarily stuff like fsinpi and fatanh) operations even if they're slow, suggesting a implementation strategy that doesn't require much hardware, can be shared with the integer divider, and works even on ultra-low-power devices that use an iterative multiplier. The reason being for compatibility with RVG software, though that may not be necessary for deeply embedded systems.

Have to be careful not to embed assumption or subtly constrain implementors by accident in standards

L.