[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Thu Aug 8 00:43:06 BST 2019

On Wed, Aug 7, 2019, 15:36 'MitchAlsup' via RISC-V ISA Dev <
isa-dev at groups.riscv.org> wrote:

> Is this proposal going to <eventually> include::
>
> a) statement on required/delivered numeric accuracy per transcendental ?
>
>From what I understand, they require correctly rounded results. We should
eventually state that somewhere. The requirement for correctly rounded
results is so the instructions can replace the corresponding functions in
libm (they're not just for GPUs) and for reproducibility across
implementations.

b) a reserve on the OpCode space for the double precision equivalents ?
>
the 2 bits right below the funct5 field select from:
00: f32
01: f64
10: f16
11: f128

so f64 is definitely included.

see https://libre-riscv.org/rv_major_opcode_1010011/#index2h1
see table 11.3 in Volume I: RISC-V Unprivileged ISA V20190608-Base-Ratified

it would probably be a good idea to split the trancendental extensions into
separate f32, f64, f16, and f128 extensions, since some implementations may
want to only implement them for f32 while still implementing the D (f64
arithmetic) extension.

c) a statement on <approximate> execution time ?
>
that would be microarchitecture specific. since this is supposed to be an
inter-vendor (icr the right term) specification, that would be up to the
implementers. I would assume that they are at least faster then a
soft-float implementation (since that's usually the whole point of
implementing them).

For our implementation, I'd imagine something between 8 and 40 clock cycles
for most of the operations. sin, cos, and tan (but not sinpi and friends)
may require much more than that for large inputs for range reduction to
accurately calculate x mod 2*pi, hence why we are thinking of implementing
sinpi, cospi, and tanpi instead (since they require calculating x mod 2,
which is much faster and simpler).

You may have more transcendentals than necessary::
> 1) for example all of the inverse hyperbolic can be calculated to GRAPHICs
> numeric quality with short sequences of already existing transcendentals
> ..... ASINH( x ) = ln( x + SQRT(x**2+1) )
>
That's why the hyperbolics extension is split out into a separate
extension. Also, a single instruction may be much faster since it can
calculate it all as one operation (cordic will work) rather than requiring
several slow operations sqrt/div and log.

2) LOG(x) = LOGP1(x) + 1.0
> ... EXP(x) = EXPM1(x-1.0)
>
> That is:: LOGP1 and EXPM1 provide greater precision (especially when the
> result is near zero) than their sister functions, and the compiler can
> easily add the additional instruction to the instruction stream where
> appropriate.
>
for the implementation techniques I know for log/exp, implementing both
log/exp and logp1/expm1 is a slight increase in complexity compared to only
one or the other (changing constants for polynomial/lut-based
implementations and for cordic). I think it's worth saving the extra
instructions for the common case of implementing pow (where you need
log/exp) and logp1/expm1 is not worth getting rid of due to the small
additional cost and additional accuracy obtained.

Jacob Lifshay