[libre-riscv-dev] [isa-dev] FP reciprocal sqrt extension proposal

Bill Huffman huffman at cadence.com
Fri Jul 12 16:07:04 BST 2019

The rounding isn't difficult in an N-bit at a time algorithm that doesn't have a redundant result representation.  For a Newton-Raphson implementation or a redundant result implementation, rounding is more difficult.


On 7/12/19 1:27 AM, Jacob Lifshay wrote:

On Fri, Jul 12, 2019, 00:16 lkcl <luke.leighton at gmail.com<mailto:luke.leighton at gmail.com>> wrote:
On Friday, July 12, 2019 at 4:42:30 AM UTC+8, glemieux wrote:

> might there be more performance value in making it dual-operand to make better use of available read ports, eg:
> a/sqrt(b)
>   or
> 1/sqrt(a+b)

The hybrid combibation of divide and isqrt (or, multiply and isqrt), I have not seen any hardware out there that does this. I would be concerned about the increase in gate count, it is 2 complex special purpose blocks, back to back.

it barely increases complexity over what we already have:
for the DivPipeCore* classes I've been writing for libre-riscv's gpu (they handle the mantissa for fdiv, fsqrt, and frsqrt, as well as unsigned integer div/rem), supporting a/sqrt(b) is as simple as assigning divisor to compare_lhs instead of 1.0:

Also I would be concerned about the rounding, just working it out (let alone implementing it).
rounding uses the exact same algorithm, generate 2 more bits of quotient/root for guard and round, then compare remainder to zero to generate sticky bit.


More information about the libre-riscv-dev mailing list