[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Thu Aug 8 07:30:22 BST 2019

Hi folks,

We would seem to be putting the cart before the horse.  ISA-level support
for correctly rounded transcendentals is speciously attractive, but its
utility is not clearly evident and is possibly negative.  It does not make
sense to allocate opcode space under these circumstances.

Andrew

On Wed, Aug 7, 2019 at 6:17 PM 'MitchAlsup' via RISC-V ISA Dev <
isa-dev at groups.riscv.org> wrote:

> An old guy at IBM (a Fellow) made a long and impassioned plea in a paper
> from the late 1970s or early 1980s that whenever something is put "into the
> instruction set" that the result be as accurate as possible. Look it up,
> it's a good read.
>
> At the time I was working for a mini-computer company where a new
> implementation was not giving binary accurate results compared to an older
> generation. This was traced to an "enhancement" in the F32 and F64 accuracy
> from the new implementation. To a customer, they all wanted binary
> equivalence, even if the math was worse.
>
> On the other hand, back when I started doing this (CPU design) the guys
> using floating point just wanted speed and they were willing to put up with
> not only IBM floating point (Hex normalization, and gard digit) but even
> CRAY floating point (CDC 6600, CDC 7600, CRAY 1) which was demonstrably
> WORSE in the numerics department.
>
> In any event; to all. but 5 floating point guys in the world, a rounding
> error (compared to the correctly rounded result) occurring less often than
> 3% of the time and no more than 1 ULP, is as accurate as they need (caveat:
> so long as the arithmetic is repeatable.) As witness, the FDIV <lack of>
> instruction in ITANIC had a 0.502 ULP accuracy (Markstein) and nobody
> complained.
>
> My gut feeling tell me that the numericalists are perfectly willing to
> accept an error of 0.51 ULP RMS on transcendental calculations.
> My gut feeling tell me that the numericalists are not willing to accept an
> error of 0.75 ULP RMS on transcendental calculations.
> I have no feeling at all on where to draw the line.
>
> On Wednesday, August 7, 2019 at 7:57:38 PM UTC-5, lkcl wrote:
>>
>>
>>
>> On Thursday, August 8, 2019 at 1:29:29 AM UTC+1, MitchAlsup wrote:
>>>
>>>
>>>
>>> On Wednesday, August 7, 2019 at 6:43:21 PM UTC-5, Jacob Lifshay wrote:
>>>>
>>>> On Wed, Aug 7, 2019, 15:36 'MitchAlsup' via RISC-V ISA Dev <
>>>> isa... at groups.riscv.org> wrote:
>>>>
>>>>> Is this proposal going to <eventually> include::
>>>>>
>>>>> a) statement on required/delivered numeric accuracy per transcendental
>>>>> ?
>>>>>
>>>> From what I understand, they require correctly rounded results. We
>>>> should eventually state that somewhere. The requirement for correctly
>>>> rounded results is so the instructions can replace the corresponding
>>>> functions in libm (they're not just for GPUs) and for reproducibility
>>>> across implementations.
>>>>
>>>
>>> Correctly rounded results will require a lot more difficult hardware and
>>> more cycles of execution.
>>> Standard GPUs today use 1-2 bits ULP for simple transcendentals and 3-4
>>> bits for some of the harder functions.
>>> Standard GPUs today are producing fully pipelined results with 5 cycle
>>> latency for F32 (with 1-4 bits of imprecision)
>>> Based on my knowledge of the situation, requiring IEEE 754 correct
>>> rounding will double the area of the transcendental unit, triple the area
>>> used for coefficients, and come close to doubling the latency.
>>>
>>
>> hmmm... i don't know what to suggest / recommend here.  there's two
>> separate requirements: accuracy (OpenCL, numerical scenarios), and 3D GPUs,
>> where better accuracy is not essential.
>>
>> i would be tempted to say that it was reasonable to suggest that if
>> you're going to use FP32, expectations are lower so "what the heck".
>> however i have absolutely *no* idea what the industry consensus is, here.
>>
>> i do know that you've an enormous amount of expertise and experience in
>> the 3D GPU area, Mitch.
>>
>> I can point you at (and have) the technology to perform most of these to
>>> the accuracy stated above in 5 cycles F32.
>>>
>>> I have the technology to perform LN2P1, EXP1M in 14 cycles, SIN, COS
>>> including argument reduction in 19 cycles, POW in 34 cycles while achieving
>>> "faithfull rounding" of the result in any of the IEEE 754-2008 rounding
>>> modes and using a floating point unit essentially the same size as an FMAC
>>> unit that can also do FDIV and FSQRT. SIN and COS have full Payne and Hanek
>>> argument reduction, which costs 4-cycles and allows for "silly arguments to
>>> be properly processed:: COS( 6381956970095103×2^797) =
>>> -4.68716592425462761112×10-19
>>>
>>
>> yes please.
>>
>> there will be other implementors of this Standard that will want to make
>> a different call on which direction to go.
>>
>> l.
>>
>>> --
> You received this message because you are subscribed to the Google Groups
> "RISC-V ISA Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to isa-dev+unsubscribe at groups.riscv.org.
> To view this discussion on the web visit
> https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0a8e035c-0996-44ba-af8d-d19be84575f5%40groups.riscv.org
> <https://groups.google.com/a/groups.riscv.org/d/msgid/isa-dev/0a8e035c-0996-44ba-af8d-d19be84575f5%40groups.riscv.org?utm_medium=email&utm_source=footer>
> .
>