[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Mitchalsup mitchalsup at aol.com
Sun Sep 15 22:21:03 BST 2019



Mitch AlsupMitchAlsup at aol.com

-----Original Message-----
From: lkcl <luke.leighton at gmail.com>
To: RISC-V ISA Dev <isa-dev at groups.riscv.org>
Cc: luke.leighton <luke.leighton at gmail.com>; libre-riscv-dev <libre-riscv-dev at lists.libre-riscv.org>
Sent: Sun, Sep 15, 2019 9:23 am
Subject: Re: [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

On Sunday, September 15, 2019 at 1:41:53 PM UTC+8, Jacob Lifshay wrote:
> Note that Vulkan and RISC-V conformance are at different levels of the software/hardware stack: Vulkan requires API-level conformance whereas RISC-V requires ISA-level conformance, the Vulkan driver is free to translate as needed or even implement some features entirely in software, not everything needs to be implemented in hardware.

Very important point, thank you for raising it.

For the *hardware* instructions - and this includes FDIV which is *already* in RV - an OpenGL Khronos Compliant system would be performance-uncompetitive under normal market forces.

Put in the bluntest and clearest terms: complying with RISCV Standards actually *destroys* the chances of a product's competitive commercial adoption in the OpenCL market!

Software compliance with Khronos, the issue is moot, because (a) software can adapt and (b) nobody expects software to have the same performance as hardware.

It is therefore *only* the instructions that happen to be in IEEE754, *and* hapoen to be in OpenCL *and* happen to have non IEEE754 accuracy requirements in OpenCL *and* there are market forces that compel vendors to want to compete on hardware acceleration....
The Graphics compilers I know of take x/y and convert it into x×(1/y). This converts FDIV into RCP and FMUL. RCP is not in IEEE 754, and at this point it does not matter what the accuracy of FMUL ends upbeing (RCP is the accuracy killer). 
So, if the applications want accuracy, it uses FDIV, and if it wants speed, it uses RCP and FMUL.
Note: RCP can be fully pipelined (easily in SP, not so easily in DP).FMUL is always pipelined in High Perf implementations.
But this is illustrating the point I made earlier:: FDIV is required to have accuracy, RCP+FMUL is not.RCP is a different OpCode than FDIV and in this way you have avoided shooting yourself in the foot.
Perhaps all that is needed is to describe LN2 and EXP2 as having graphics-useful accuracy,leaving LN2P1 and EXP2P1 as full accuracy instructions. In this way you are not having to havemultiple instructions of the same flavor, each with different accuracies, but you have high speedversions and high accuracy versions.




I would argue that correctly-rounded log2 and exp2 instructions are still necessary, since they do round differently than the hypothetical log2p1 and exp2m1 (which are not currently being proposed, logp1 and expm1 are instead, which are base e log and exp rather than base 2; log2p1 and exp2m1 are not in the C standard but logp1 and expm1 are).
Consider EXP2 in the range [-1..+1]a) the input argument range contains 1/2 of all IEEE representable valuesb) the output range [0.5..2.0] contains 3 / 2048 of all IEEE representable values.Thus:: EXP2 has already killed the accuracy regardless of how well rounding is performed.
A similar argument can be constructed wrt Ln2
The accuracy is already gone--and in the classical region of use (small numbers.)
Jacob


More information about the libre-riscv-dev mailing list