[libre-riscv-dev] [isa-dev] FP reciprocal sqrt extension proposal
lkcl
luke.leighton at gmail.com
Fri Jul 12 09:52:41 BST 2019
On Friday, July 12, 2019 at 4:28:09 PM UTC+8, Jacob Lifshay wrote:
> On Fri, Jul 12, 2019, 00:16 lkcl <luke.l... at gmail.com> wrote:
>
> > a/sqrt(b)
> The hybrid combination of divide and isqrt (or, multiply and isqrt), I have not seen any hardware out there that does this. I would be concerned about the increase in gate count, it is 2 complex special purpose blocks, back to back.
>
>
> it barely increases complexity over what we already have:
>
>
>
> for the DivPipeCore* classes I've been writing for libre-riscv's gpu (they handle the mantissa for fdiv, fsqrt, and frsqrt, as well as unsigned integer div/rem), supporting a/sqrt(b) is as simple as assigning divisor to compare_lhs instead of 1.0:
> https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/div_rem_sqrt_rsqrt/core.py;h=e6a0b9b9d848d93cbef2b35b371befb2d946d7a2;hb=HEAD#l253
Indeed: that is for that particular implementation. An alternative implementation, such as the one in the Tanako paper, specifically and only works if the divisor has been arranged to be within the range 0.25 to 0.5 (something like that).
With the Tanako implementation being so efficient, due to the use of redundant CSA, it would be... unwise shall we say, to make even potential adoption of that algorithm less attractive not just for our own ASIC but for other implementors as well.
(the Tanako implementation cannot use the same trick, so a separate multiplier would be needed).
>
>
>
>
> Also I would be concerned about the rounding, just working it out (let alone implementing it).
>
> rounding uses the exact same algorithm, generate 2 more bits of quotient/root for guard and round, then compare remainder to zero to generate sticky bit.
What I mean is, the implications of rounding due to interactions between the a and the b operands in a/sqrt(b).
This is beyond my ability to assess with any degree of confidence, and I would only be happy if it was assessed and found to be correct (no complications) by consensus of several peers with significant long term expertise in IEEE754 FP.
Andrew raised one concern already (+/- zero), I raised another (exception interaction), there may be others and my concern is that it adds months if not years to our schedule, whereas FP-RSQRT on its own is very easy to justify as the use case is extremely clear.
Standards are quite challenging, so many factors to take into account!
L.
More information about the libre-riscv-dev
mailing list