[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Fri Aug 9 05:16:32 BST 2019

On Friday, August 9, 2019 at 2:19:38 AM UTC+8, Dan Petrisko wrote:

> A natural place for a standard reduced accuracy extension "Zfpacc" would be in the reserved bits of FCSR.

I like it [separate extension]

> Let's say that we wish to support up to 4 accuracy modes -- 2 'fam' bits. 

[From the 3D Embedded world, where between 12 to 18 bits are typically used, it may be necessary to have more than 2 bits.  We will see what happens when more input/feedback from stakeholders occurs]

Also there are some specific reduced accuracy requirements ("fast_*) in the OpenCL SPIRV opcode spec, these would need to be included too.

Otherwise, separate opcodes would need to be added, just to support those SPIRV operations, which is against the principle of RISC.

> Default would be IEEE-compliant, encoded as 00.  This means that all current hardware would be compliant with the default mode.

This would be important!

> 
> the unsupported modes would cause a trap to allow emulation where traps are supported. emulation of unsupported modes would be required for unix platforms.

Yes, agreed, this is v important. Embedded are on their own (as normal).

> 
> As with frm, an implementation can choose to support any permutation of dynamic fam-instruction pairs. It will illegal-instruction trap upon executing an unsupported fam-instruction pair.  The implementation can then emulate the accuracy mode required.

I like it.

> there would be a mechanism for user mode code to detect which modes are emulated (csr? syscall?) (if the supervisor decides to make the emulation visible)

Hmmmm, if speed or power consumption of an implementation is compromised by that, it would be bad (and also Khronos nonconformant, see below).

> that would allow user code to switch to faster software implementations if it chooses to.
>  
> If the bits are in FCSR, then the switch itself would be exposed to user mode.  User-mode would not be able to detect emulation vs hardware supported instructions, however (by design).  That would require some platform-specific code.

Hmmm. 3D is quite different.

Look at software unaccelerated MesaGL. High end games are literally unplayable in software rendering, and the Games Studios will in some cases not even permit the game to run if certain hardware characteristics are not met, because it would bring the game into disrepute if it was permitted to run and looked substandard.

Bottom line is: Security be damned - the usermode software *has* to know *everything* about the actual hardware, and there are Standard APIs to list the hardware characteristics.

If those APIs "lie" about those characteristics, not only will the end users bitch like mad (justifiably), the ASIC will *FAIL* Khronos conformance and compliance and will not be permitted to be sold with the Vulkan and OpenGL badge on it (they're Trademarks).

There will be some designs where even the temperature sensors are fed back to userspace and the 3D rendering demands dialed back to not overheat the ASIC and still keep user response time expectations to acceptable levels.

(Gamers HATE lag. It can result in loss of a tournament).

> 
> Now, which accuracy modes should be included is a question outside of my expertise and would require a literature review of instruction frequency in key workloads, PPA analysis of simple and advanced implementations, etc. 

Yes. It's a lot of work (that offline message had some links already), and my hope is that the stakeholders in the (yet to be formed/announced) 3D Open Graphics Alliance will have a vested interest in doing exactly that.

The point of raising the Ztrans and Zftrig* proposal at this early phase is to have some underpinnings so that the Alliance members can hit the gound running.

> emulation of unsupported modes would be required for unix platforms.

Yes.

> 
> I don't see why Unix should be required to emulate some arbitrary reduced accuracy ML mode.

It's completely outside of my area of expertise to say, one way or the other. It will need a thorough review and some input from experienced 3D software developers.

>  My guess would be that Unix Platform Spec requires support for IEEE, 

concur.

> whereas arbitrary ML platform requires support for Mode XYZ.  Of course, implementations of either platform would be free to support any/all modes that they find valuable. 

concur.

> Compiling for a specific platform means that support for required accuracy modes is guaranteed (and therefore does not need discovery sequences), while allowing portable code to execute discovery sequences to detect support for alternative accuracy modes.

The latter will be essential for detecting the "fast_*" capability.

Main point, I cannot emphasise enough how critical it is that userspace software get at the underlying hardware characteristics. This for Khronos Standards Compliance.

Sensible proposal, Dan. will write it up shortly.

L.