[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Thu Aug 8 19:19:24 BST 2019

>
> maybe a solution would be to add an extra field to the fp control csr (or
> isamux?) to allow selecting one of several accurate or fast modes:

Preface: As Andrew points out, any ISA proposal must be associated with a
quantitative evaluation to consider tradeoffs.

A natural place for a standard reduced accuracy extension "Zfpacc" would be
in the reserved bits of FCSR.  It could be treated very similarly to how
dynamic frm is treated now. Currently, there are 5 bits of fflags, 3 bits
of frm and 24 Reserved bits. The L (decimal floating-point) extension will
presumably use some, but not all of them. I'm unable to find any public
proposals for L bit encodings in FCSR.

For reference, frm is treated as follows:

> Floating-point operations use either a static rounding mode encoded in the
> instruction, or a dynamic rounding mode held in frm. Rounding modes are
> encoded as shown in Table 11.1. A value of 111 in the instruction’s rm field
> selects the dynamic rounding mode held in frm. If frm is set to an
> invalid value (101–111), any subsequent attempt to execute a floating-point
> operation with a dynamic rounding mode will raise an illegal instruction
> exception.

Let's say that we wish to support up to 4 accuracy modes -- 2 'fam' bits.
Default would be IEEE-compliant, encoded as 00.  This means that all
current hardware would be compliant with the default mode.

the unsupported modes would cause a trap to allow emulation where traps are
> supported. emulation of unsupported modes would be required for unix
> platforms.

As with frm, an implementation can choose to support any permutation of
dynamic fam-instruction pairs. It will illegal-instruction trap upon
executing an unsupported fam-instruction pair.  The implementation can then
emulate the accuracy mode required.

there would be a mechanism for user mode code to detect which modes are
> emulated (csr? syscall?) (if the supervisor decides to make the emulation
> visible) that would allow user code to switch to faster software
> implementations if it chooses to.

If the bits are in FCSR, then the switch itself would be exposed to user
mode.  User-mode would not be able to detect emulation vs hardware
supported instructions, however (by design).  That would require some
platform-specific code.

Now, which accuracy modes should be included is a question outside of my
expertise and would require a literature review of instruction frequency in
key workloads, PPA analysis of simple and advanced implementations, etc.
(Thanks for the insights, Mitch!)

emulation of unsupported modes would be required for unix platforms.

I don't see why Unix should be required to emulate some arbitrary reduced
accuracy ML mode.  My guess would be that Unix Platform Spec requires
support for IEEE, whereas arbitrary ML platform requires support for Mode
XYZ.  Of course, implementations of either platform would be free to
support any/all modes that they find valuable.  Compiling for a specific
platform means that support for required accuracy modes is guaranteed (and
therefore does not need discovery sequences), while allowing portable code
to execute discovery sequences to detect support for alternative
accuracy modes.

Best,
Dan Petrisko

On Thu, Aug 8, 2019 at 11:58 AM 'MitchAlsup' via RISC-V ISA Dev <
isa-dev at groups.riscv.org> wrote:

>
> We are talking about all of this without a point of reference.
>
> Here is what I do know about correctly rounded transcendentals::
>
> My technology for performing transcendentals in an FMAC unit performs a
> power series polynomial calculation.
>
> I can achieve 14 cycle LN2, EXP2 and 19 cycle SIN, COS faithfully rounded
> with coefficient tables which are (essentially) the same size as the
> FDIV/FSQRT seed tables for Newton-Raphson (or Goldschmidt) iterations. FDIV
> will end up at 17 cycles and FSQRT at 23 cycles. This is exactly what
> Opteron FDIV/FSQRT performance was (oh so onog ago).
>
> If you impose the correctly rounded requirement::
> a) the size of the coefficient tables grows by 3.5× and
> b) the number of cycles to compute grows by 1.8×
> c) the power to compute grows by 2.5×
> For a gain of accuracy of about 0.005 ULP
>
> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+unsubscribe at groups.riscv.org.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/26e2386a-8a8e-450b-9ab7-dc2453ccce71%40groups.riscv.org
> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/26e2386a-8a8e-450b-9ab7-dc2453ccce71%40groups.riscv.org?utm_medium=email&utm_source=footer>
> .
>