[libre-riscv-dev] [Bug 44] IEEE754 FPU inverse-sqrt
bugzilla-daemon at libre-riscv.org
bugzilla-daemon at libre-riscv.org
Wed May 1 06:40:57 BST 2019
http://bugs.libre-riscv.org/show_bug.cgi?id=44
--- Comment #7 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #6)
> (In reply to Luke Kenneth Casson Leighton from comment #4)
> > different approach:
> >
> > https://rei.iteso.mx/bitstream/handle/11117/5844/Design+and+Implementation+of+Reciprocal+Square+Root+Units+on+Digital+ASIC+Technology+For+Low+Power+Embedded+Applications.pdf
> there was a server error when I tried to access this
>
> > https://github.com/tomverbeure/math/blob/master/src/main/scala/math/FpxxRSqrt.scala
>
> as far as I can tell, this is only a lookup table.
>
> > both these, as best i can tell, use a polynomial lookup table which requires
> > only 1 iteration of newton-raphson to give full accuracy.
>
> I think it would be wise to avoid newton-raphson since it needs several
> multiplications and will occupy the multiplier for quite a few cycles.
The fixed rsqrt algorithm i saw was able to do the mult in a single cycle,
however it may have been a smaller mantissa.
The algorithm you found, I really like it, only the description is really
obtuse. Assuming we can do a 4 radix version, 2 of those can be combinatorially
chained to give 8bit per cycle, and for FP32 that gives only around a 5 stage
pipeline with basic single-shift, add and MUX blocks.
What is the sort of performance actually needed? Is this critical for certain
circumstances?
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list