[libre-riscv-dev] FP transcendentals (trigonometry, root/exp/log) proposal

Thu Aug 8 01:27:07 BST 2019

[some overlap with what jacob wrote, reviewing/removing redundant replies]

On Wednesday, August 7, 2019 at 11:36:17 PM UTC+1, MitchAlsup wrote:
>
> Is this proposal going to <eventually> include:: 
>
a) statement on required/delivered numeric accuracy per transcendental ?
>

originally thought it was just this: 
https://www.khronos.org/registry/spir-v/specs/unified1/OpenCL.ExtendedInstructionSet.100.html

jacob makes and emphasises the point that these are intended to be *scalar* 
operations, for direct use in libm.

b) a reserve on the OpCode space for the double precision equivalents ?
>

reservations, even where the case has been made clear that the impact of 
not having a reservation will cause severe detrimental ongoing impact for 
the wider RISC-V community, do not have an IANA-style contact/proposal 
procedure.  i've repeatedly requested an official reservation, for this and 
many other proposals.

i have not received a response.

Jacob wrote:

> it would probably be a good idea to split the trancendental extensions
> into separate f32, f64, f16, and f128 extensions, since some 
implementations 
> may want to only implement them for f32 while still implementing the D
> (f64 arithmetic) extension.

oh, of course. Ztrans.F/Q/S/H is a really good point.

c) a statement on <approximate> execution time ?
>

what jacob said.

as a Standard, we can't limit the proposal in ways that would restrict or 
exclude implementors.  accuracy on the other hand *is* important, because 
it could potentially cause catastrophic failures if an algorithm is written 
to critically rely on a given accuracy.

You may have more transcendentals than necessary::
> 1) for example all of the inverse hyperbolic can be calculated to GRAPHICs 
> numeric quality with short sequences of already existing transcendentals
> ..... ASINH( x ) = ln( x + SQRT(x**2+1) )
>
>
ah, excellent - i'll add that recipe to the document.   Zfhyp, separate 
extension.

2) LOG(x) = LOGP1(x) + 1.0
> ... EXP(x) = EXPM1(x-1.0)
>
> That is:: LOGP1 and EXPM1 provide greater precision (especially when the 
> result is near zero) than their sister functions, and the compiler can 
> easily add the additional instruction to the instruction stream where 
> appropriate.
>

oo that's very interesting.   of course.  i like it.

the only thing: as a Standard, some implementors may find it more efficient 
to implement LOG than LOGP1 (likewise with exp).  in particular, if CORDIC 
is used (which i have just recently found, and am absolutely amazed by 
- https://en.wikipedia.org/wiki/CORDIC) i cannot find a LOGP1/EXPM1 version 
of that.

CORDIC would be the most sensible "efficient" choice of hardware algorithm, 
simply because of the sheer overwhelming number of transcendentals that it 
covers.  if there isn't a way to implement LOGP1 using CORDIC, and one but 
not the other is chosen, some implementation options will be limited / 
penalised.

this is one of the really tricky things about Standards.  if we were doing 
a single implementation, not intended in any way to be Standards-compliant, 
we could make the decision, best optimised option, according to our 
requirements, and to hell with everyone else.  take that approach with a 
Standard, and it results in... other teams creating their own Standard.

having two near-identical opcodes where one may be implemented in terms of 
the other is however rather unfortunately against the principle of RISC.  
in this particular case, though, the hardware implementation actually 
matters.

does anyone know if CORDIC can be adapted to do LOGP1 as well as LOG?  ha, 
funny, i found this:
http://dns.uls.cl/~ej/daa_08/Algoritmos/books/book10/9010f/jarvis.asc

unfortunately, the original dr dobbs article, which has "example 4(d)" as a 
hyperlink, redirects to a 404 not found.

l.