[libre-riscv-dev] FP unit testing (was Re: [isa-dev] FP reciprocal sqrt extension proposal)

lkcl luke.leighton at gmail.com
Sun Jul 14 08:26:00 BST 2019

On Sunday, July 14, 2019 at 7:39:03 AM UTC+1, Jacob Lifshay wrote:
> On Sat, Jul 13, 2019 at 10:30 AM Aneesh Raveendran <anees... at gmail.com 
> <javascript:>> wrote:
> >
> > Hi all,
> >     Myself Aneesh Raveendran. I worked on RISC-V floating point 
> co-processor. I have few doubts regarding floating point reciprocal 
> square-root.
> >
> > 1. In which application/bench marking suites will infer floating point 
> reciprocal square-root operations?
> reciprocal sqrt is used a lot in 3D graphics for normalizing vectors -- 
> the pseudocode for normalizing 3D a vector is:
> fn normalize(x: float, y: float, z: float) -> (float, float, float) {
>     let sum_of_squares = x * x + y * y + z * z;
>     let factor = rsqrt(sum_of_squares);
>     return (factor * x, factor * y, factor * z);
> }
> It can also be used in machine learning to normalize 1-hot output vectors, 
> though would not be particularly performance critical for that particular 
> usecase.

where for 3D it definitely is (even FDIV has to be pipelined)

> > 2. If this instruction is proposing, what could be the possible 
> instruction formats? (opcodes, f7, f5 field values )
> The proposed instructions are:
> +----------+---------+-------+-----+--------+----+---------+
> | Mnemonic | funct7  | rs2   | rs1 | funct3 | rd | opcode  |
> +==========+=========+=======+=====+========+====+=========+
> | frsqrt.s | 0111100 | 00000 | rs1 | rm     | rd | 1010011 |
> +----------+---------+-------+-----+--------+----+---------+

i.e. *exactly* the same format as FSQRT... just with a new funct7.

> > 3. Any testsuites are available to verify the functional correctness of 
> the module?
> mpfr implements reciprocal sqrt, however it doesn't support all of 
> RISC-V's rounding modes and may be missing support for other features 
> needed for testing.
> Softfloat doesn't currently implement rsqrt.
> I have not researched other softfloat libraries yet.

the key one that we're using is softfloat-3 (custom-compiled to enable 
RISC-V mode), via manually-compiled python bindings (sfpy) because if you 
install the debian package, of course it uses the *INTEL*-compiled 
softfloat-3 library, which is precisely what you absolutely do not want.  
instructions to do this are here:


for the nmigen IEEE754 FPU we started from this code:

the majority of that code is the unit testing, except for multiplier.v 
itself which is a FSM (extremely compact, fits really well into a very 
small FPGA as long as it has a decent on-board DSP; performance is 
absolutely dreadful due to single-cycle shifting in the normalisation 
phase.  replacing that with single-cycle was interesting).

you can see in the c_test directory that jon checked in a binary executable 
(do NOT run it, it is clearly unsafe to do so), and next to it is the 
source test.cpp.  clearly this code uses the STANDARD C FP LIBRARY on 
whatever platform it is compiled on.  this is just as clearly NOT WHAT YOU 
WANT, because if compiled on an intel x86 system, the unit tests will pass 
only intel x86 FP RTL.

this is precisely why we use sfpy [compiled specifically for RISC-V].

jon's unit test code has "morphed" and become extremely generic:

examples of how it is used are here - see test_fpmul_pipe_16.py:

they're dead-simple, at this level.  note that sfpy.Float16 and 
operator.mul are passed in to the test function: that's the two key 
parameters that are all that is needed (that and "width") to verify the 
RTL.  we're testing FPMUL, therefore we pass in operator.mul.  we're 
testing FP16, therefore we pass in sfpy.Float16.  duh :)

the key function run_pipe_fp yields the unit test cases to cover a full 
random range of specialist combinations (corner cases) that are highly 
likely to fail, and, only after covering those, full arbitrary random 
numbers are generated.

i can strongly recommend developing *generic* RTL that is *FULLY 
PARAMETERISEABLE*, and testing FP16 *FIRST*.  the reason is really simple: 
FP16, by way of being much smaller bitwidths for both the exponent and 
mantissa, results in far better coverage of corner cases, which, in FP32 
and FP64 are simply too low probability of occurring through pseudo-random 
monte-carlo testing.

however if the RTL is fully parameterisable, guess what? when it comes to 
FP32 and FP64, you already tested the corner-cases of the exact same code 
that generated FP16, so the probability of correctness may be deemed much 

later, we will add in formal mathematical proofs, using symbiyosys.  this 
as an entirely separate project.  we do not really trust random testing, 
not even on corner-cases.

if anyone has verilog FP RTL that they need testing, and would like to use 
the above unit test infrastructure, that's dead easy: investigate cocotb.  
cocotb is a python wrapper around icarus verilog, and is extremely nifty.  
cocotb compiles up VERILOG and inserts instrumentation into the datastream 
(at the dut level) which allows it to set and read VERILOG parameters from 

the cocotb unit test at which i went "holy cow that's awesome" was one 
which used python's PIL (imaging library) to decode a JPEG... and then 
compared it directly against the output from a libre licensed verilog JPEG 
decoder.  all from python.

jacob points out however that because sfpy does not have an FRSQRT 
function, we cannot use it.  therefore we will need to write our own 
(python-based) FRSQRT soft-emulation routine.  once written, it gets called 
exactly like this:

everything else that we need, softfloat-3 has it, and therefore (with the 
exception of RISC-V tininess bindings) the sfpy python bindings also has 
everything we need.

this gives us the confidence that, by testing against a WELL TESTED 
floating-point emulation library, we have similar confidence in the 
correctness of the libre risc-v CPU/GPU IEEE754 FPU.

using bigfloat to perform the reciprocal-square-root in a much higher 
precision will cover the requirement to provide accurate FPSQRT.  however 
the corner-cases (at the extreme limits of the exponent, and when the 
mantissa's MSB is zero) are going to be a bundle of fun.

the issue is: as we will be running an UNTESTED (unproven) soft-emulation 
against an UNTESTED (unproven) hardware simulation, we have zero confidence 
in either.  exactly how to deal with this will be the subject of intensive 
further investigation.



More information about the libre-riscv-dev mailing list