[libre-riscv-dev] [Bug 44] IEEE754 FPU inverse-sqrt

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Wed May 1 06:40:57 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=44

--- Comment #7 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #6)
> (In reply to Luke Kenneth Casson Leighton from comment #4)
> > different approach:
> > 
> > https://rei.iteso.mx/bitstream/handle/11117/5844/Design+and+Implementation+of+Reciprocal+Square+Root+Units+on+Digital+ASIC+Technology+For+Low+Power+Embedded+Applications.pdf
> there was a server error when I tried to access this
> 
> > https://github.com/tomverbeure/math/blob/master/src/main/scala/math/FpxxRSqrt.scala
> 
> as far as I can tell, this is only a lookup table.
> 
> > both these, as best i can tell, use a polynomial lookup table which requires
> > only 1 iteration of newton-raphson to give full accuracy.
> 
> I think it would be wise to avoid newton-raphson since it needs several
> multiplications and will occupy the multiplier for quite a few cycles.

The fixed rsqrt algorithm i saw was able to do the mult in a single cycle,
however it may have been a smaller mantissa.

The algorithm you found, I really like it, only the description is really
obtuse. Assuming we can do a 4 radix version, 2 of those can be combinatorially
chained to give 8bit per cycle, and for FP32 that gives only around a 5 stage
pipeline with basic single-shift, add and MUX blocks.

What is the sort of performance actually needed? Is this critical for certain
circumstances?

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list