[libre-riscv-dev] fp special functions
Jacob Lifshay
programmerjake at gmail.com
Sun Aug 4 19:16:52 BST 2019
On Sat, Aug 3, 2019, 22:25 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:
> got an idea, transcendentals (scalar) proposal, similar to Zfrsqrt,
> need to find the space for sin, cos, atan, exp, pow, log, and so on.
> on-list first, then isa-dev?
>
Sounds good to me.
I think we should have our primitive instructions be correctly rounded,
since, for all but sin/cos/tan/sec/cosec/cotan, that doesn't take much more
precision. I think we should implement sinpi/cospi and friends since they
avoid the need to have an extremely (several hundred bit) accurate version
of pi.
Note for Atif and Grant: I'm currently working on an algebraic numbers
library that can be used to verify the fp implementations for
add/sub/mul/div/sqrt/rsqrt/cbrt/hypot.
https://salsa.debian.org/Kazan-team/algebraics
Note that even though sinpi/cospi theoretically are algebraic numbers for
rational inputs, the degree of the polynomials is prohibitive for large
denominatos.
We should avoid the pitfall of intel's x87 sin/cos implementations:
https://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/
The functions I think are worth implementing in addition to F/D:
trig-pi functions (range reduction is trivial (x mod 2.0)):
* sinpi
* cospi
* sincospi (non-standard; like sincos)
* atan2pi
extended trig-pi functions (separate extension; sincospi/atan2pi is
sufficient for graphics)
* tanpi
* asinpi
* acospi
non-*pi trig functions (in a separate extension since accurate range
reduction is quite difficult, approximating using the *pi functions will
work for graphics):
* sin
* cos
* sincos
* tan
* atan2
* asin
* acos
powers:
* cbrt
* hypot (avoids overflow/underflow with extended exponent range for
intermediates)
* rsqrt (proposed in Zfrsqrt extension)
general powers (as separate extension due to complexity; exp2/log2 plus
checking for odd powers/roots is sufficient for graphics):
* pow
* root
exp/log:
* exp2
* log2
* expm1 (extra precision around 0)
* logp1 (extra precision around 0)
extended exp/log as separate extension (not needed for graphics since
exp2/log2 is sufficient):
* exp
* log
* exp10
* log10
hyperbolics as separate extension (not needed for graphics since exp2/log2
is sufficient):
* acosh
* asinh
* atanh
* cosh
* sinh
* tanh (may want to split out in separate extension since sometimes used
for machine learning, however fmax(x,x*(1.0/256.0)) is a generally
sufficient replacement transfer function)
the erf/erfc/gamma/bessel/zeta/etc. functions can be left to software
implementations.
see also:
https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html
https://www.khronos.org/registry/OpenCL/specs/2.2/html/OpenCL_Env.html#relative-error-as-ulps
https://www.khronos.org/registry/spir-v/specs/unified1/GLSL.std.450.html
https://www.khronos.org/registry/vulkan/specs/1.1-extensions/html/chap40.html#spirvenv-precision-operation
Jacob Lifshay
More information about the libre-riscv-dev
mailing list