# [libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

MitchAlsup MitchAlsup at aol.com
Mon Aug 12 18:52:15 BST 2019

```
On Sunday, August 11, 2019 at 10:20:28 PM UTC-5, lkcl wrote:
>
> https://libre-riscv.org/ztrans_proposal/#khronos_equiv
>

I would like to point out that the general implementations of ATAN2 do a
bunch of special case checks and then simply call ATAN.

double ATAN2( double y, double x )

{   // IEEE 754-2008 quality ATAN2

// deal with NANs

if( ISNAN( x )             ) return x;

if( ISNAN( y )             ) return y;

// deal with infinities

if( x == +∞    && |y|== +∞  ) return copysign(  π/4, y );

if( x == +∞                 ) return copysign(  0.0, y );

if( x == -∞    && |y|== +∞  ) return copysign( 3π/4, y );

if( x == -∞                 ) return copysign(    π, y );

if(               |y|== +∞  ) return copysign(  π/2, y );

// deal with signed zeros

if( x == 0.0  &&  y != 0.0 ) return copysign(  π/2, y );

if( x >=+0.0  &&  y == 0.0 ) return copysign(  0.0, y );

if( x <=-0.0  &&  y == 0.0 ) return copysign(    π, y );

// calculate ATAN2 textbook style

if( x  > 0.0               ) return     ATAN( |y / x| );

if( x  < 0.0               ) return π - ATAN( |y / x| );

}

Yet the proposed encoding makes ATAN2 the primitive and has ATAN invent a
constant and then call/use ATAN2.

When one considers an implementation of ATAN, one must consider several
ranges of evaluation::

x Î [  -∞, -1.0]:: ATAN( x ) = -π/2 + ATAN( 1/x );

x Î (-1.0, +1.0]:: ATAN( x ) =      + ATAN(   x );

x Î [ 1.0,   +∞]:: ATAN( x ) = +π/2 - ATAN( 1/x );

I should point out that the add/sub of π/2 can not lose significance since
the result of ATAN(1/x) is bounded 0..π/2

The bottom line is that I think you are choosing to make too many of these
into OpCodes, making the hardware

function/calculation unit (and sequencer) more complicated that necessary.

----------------------------------------------------------------------------------------------------------------------------------------------------

I might suggest that if there were a way for a calculation to be performed
and the result of that calculation

chained to a subsequent calculation such that the precision of the
result-becomes-operand is wider than

what will fit in a register, then you can dramatically reduce the count of
instructions in this category while retaining

acceptable accuracy:

z = x / y

can be calculated as::

z = x × (1/y)

Where 1/y has about 26-to-32 bits of fraction. No, it's not IEEE 754-2008
accurate, but GPUs want speed and

1/y is fully pipelined (F32) while x/y cannot be (at reasonable area). It
is also not "that inaccurate" displaying

0.625-to-0.52 ULP.

Given that one has the ability to carry (and process) more fraction bits,
one can then do high precision

multiplies of  π or other transcendental radixes.

And GPUs have been doing this almost since the dawn of 3D.
```