[libre-riscv-dev] [isa-dev] FP reciprocal sqrt extension proposal
Bill Huffman
huffman at cadence.com
Fri Jul 12 16:07:04 BST 2019
The rounding isn't difficult in an N-bit at a time algorithm that doesn't have a redundant result representation. For a Newton-Raphson implementation or a redundant result implementation, rounding is more difficult.
Bill
On 7/12/19 1:27 AM, Jacob Lifshay wrote:
EXTERNAL MAIL
On Fri, Jul 12, 2019, 00:16 lkcl <luke.leighton at gmail.com<mailto:luke.leighton at gmail.com>> wrote:
On Friday, July 12, 2019 at 4:42:30 AM UTC+8, glemieux wrote:
> might there be more performance value in making it dual-operand to make better use of available read ports, eg:
>
>
> a/sqrt(b)
> or
> 1/sqrt(a+b)
The hybrid combibation of divide and isqrt (or, multiply and isqrt), I have not seen any hardware out there that does this. I would be concerned about the increase in gate count, it is 2 complex special purpose blocks, back to back.
it barely increases complexity over what we already have:
for the DivPipeCore* classes I've been writing for libre-riscv's gpu (they handle the mantissa for fdiv, fsqrt, and frsqrt, as well as unsigned integer div/rem), supporting a/sqrt(b) is as simple as assigning divisor to compare_lhs instead of 1.0:
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/div_rem_sqrt_rsqrt/core.py;h=e6a0b9b9d848d93cbef2b35b371befb2d946d7a2;hb=HEAD#l253<https://urldefense.proofpoint.com/v2/url?u=https-3A__git.libre-2Driscv.org_-3Fp-3Dieee754fpu.git-3Ba-3Dblob-3Bf-3Dsrc_ieee754_div-5Frem-5Fsqrt-5Frsqrt_core.py-3Bh-3De6a0b9b9d848d93cbef2b35b371befb2d946d7a2-3Bhb-3DHEAD-23l253&d=DwMFaQ&c=aUq983L2pue2FqKFoP6PGHMJQyoJ7kl3s3GZ-_haXqY&r=AYJ4kbebphYpRw2lYDUDCk5w5Qa3-DR3bQnFjLVmM80&m=zZZa0wqf2WOl5Zcg2c07YaeqJIE-Ymp3zWRLa5PSGMM&s=LQxpmIcpy8LCRty9ckd86ceyaoIMfisf2v5eKYjAnGw&e=>
Also I would be concerned about the rounding, just working it out (let alone implementing it).
rounding uses the exact same algorithm, generate 2 more bits of quotient/root for guard and round, then compare remainder to zero to generate sticky bit.
Jacob
More information about the libre-riscv-dev
mailing list