[libre-riscv-dev] [isa-dev] FP reciprocal sqrt extension proposal

Jacob Lifshay programmerjake at gmail.com
Thu Jul 11 22:50:13 BST 2019

On Thu, Jul 11, 2019, 14:20 Guy Lemieux <glemieux at vectorblox.com> wrote:

> Just because "that's the way it's always been done" is not a good
> reason to justify its continuance.
that's true, but conversely, we should take into account the additional HW
investment needed.

> 1/sqrt(a) has been done as single-operand because it's an easy,
> independent table-lookup operation, followed by iteration to get the
> desired precision. it converges nicely.

We (libre-riscv.org) are currently planning on using a different algorithm:
binary search to solve 1 == x * x * a gives x = 1/sqrt(a), the remainder (1
- x * x * a) gives 0 if the result is exact, allowing exact rounding by
just generating 2 additional result bits and checking if the remainder is
zero to give the guard, round, and sticky bits. we have modified the binary
search into a radix-8 search, giving 3 result bits per stage.

This algorithm could be simply modified to support the a/sqrt(b) case if
needed, by replacing the lhs of the equation being solved with the dividend.

> however, in real software, the function 1/sqrt(a) almost never stands
> alone. it is used for normalization, so it is almost always followed
> by a multiplication, ie a/sqrt(b), or preceeded by an addition, ie
> 1/sqrt(a+b).

one point to note: the a/sqrt(b) form may not help when more than 1
component of the normalized vector is needed, since, for 3D vectors, all 3
components are usually needed, meaning that a/sqrt(b) has to be used with a
== 1 and 3 separate multiplications by the rsqrt need to be done to get the
3 components.

saying it is "subjected to rounding twice" isn't really fair. if done
> as separate operations, it is subjected to rounding twice.  when done
> as an atomic operation, you can arrange extended precision and round
> only once.
That was my reasoning why a separate frsqrt instruction is needed over just
fusing two separate instructions (fsqrt followed by a fdiv) into a single
operation since instruction fusion is required to have the same effects as
executing the instructions one at a time. This doesn't affect defining
frsqrt to compute a/sqrt(b), since the frsqrt is still a single instruction.


More information about the libre-riscv-dev mailing list