[libre-riscv-dev] FP unit testing (was Re: [isa-dev] FP reciprocal sqrt extension proposal)
lkcl
luke.leighton at gmail.com
Sun Jul 14 08:26:00 BST 2019
On Sunday, July 14, 2019 at 7:39:03 AM UTC+1, Jacob Lifshay wrote:
>
> On Sat, Jul 13, 2019 at 10:30 AM Aneesh Raveendran <anees... at gmail.com
> <javascript:>> wrote:
> >
> > Hi all,
> > Myself Aneesh Raveendran. I worked on RISC-V floating point
> co-processor. I have few doubts regarding floating point reciprocal
> square-root.
> >
> > 1. In which application/bench marking suites will infer floating point
> reciprocal square-root operations?
> reciprocal sqrt is used a lot in 3D graphics for normalizing vectors --
> the pseudocode for normalizing 3D a vector is:
> fn normalize(x: float, y: float, z: float) -> (float, float, float) {
> let sum_of_squares = x * x + y * y + z * z;
> let factor = rsqrt(sum_of_squares);
> return (factor * x, factor * y, factor * z);
> }
>
> It can also be used in machine learning to normalize 1-hot output vectors,
> though would not be particularly performance critical for that particular
> usecase.
>
where for 3D it definitely is (even FDIV has to be pipelined)
> > 2. If this instruction is proposing, what could be the possible
> instruction formats? (opcodes, f7, f5 field values )
> The proposed instructions are:
> +----------+---------+-------+-----+--------+----+---------+
> | Mnemonic | funct7 | rs2 | rs1 | funct3 | rd | opcode |
> +==========+=========+=======+=====+========+====+=========+
> | frsqrt.s | 0111100 | 00000 | rs1 | rm | rd | 1010011 |
> +----------+---------+-------+-----+--------+----+---------+
>
...
...
i.e. *exactly* the same format as FSQRT... just with a new funct7.
> > 3. Any testsuites are available to verify the functional correctness of
> the module?
> mpfr implements reciprocal sqrt, however it doesn't support all of
> RISC-V's rounding modes and may be missing support for other features
> needed for testing.
> Softfloat doesn't currently implement rsqrt.
> I have not researched other softfloat libraries yet.
>
the key one that we're using is softfloat-3 (custom-compiled to enable
RISC-V mode), via manually-compiled python bindings (sfpy) because if you
install the debian package, of course it uses the *INTEL*-compiled
softfloat-3 library, which is precisely what you absolutely do not want.
instructions to do this are here:
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=README.md;h=d219864a341e4b656680de476e385b6a7f70fb9b;hb=07c771aa522785a492dfcaf4dcb33b35635528f8
for the nmigen IEEE754 FPU we started from this code:
https://github.com/dawsonjon/fpu/tree/master/multiplier
the majority of that code is the unit testing, except for multiplier.v
itself which is a FSM (extremely compact, fits really well into a very
small FPGA as long as it has a decent on-board DSP; performance is
absolutely dreadful due to single-cycle shifting in the normalisation
phase. replacing that with single-cycle was interesting).
you can see in the c_test directory that jon checked in a binary executable
(do NOT run it, it is clearly unsafe to do so), and next to it is the
source test.cpp. clearly this code uses the STANDARD C FP LIBRARY on
whatever platform it is compiled on. this is just as clearly NOT WHAT YOU
WANT, because if compiled on an intel x86 system, the unit tests will pass
only intel x86 FP RTL.
this is precisely why we use sfpy [compiled specifically for RISC-V].
jon's unit test code has "morphed" and become extremely generic:
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fpcommon/test/case_gen.py;h=1e81f341eb6d45312e2de5da19b727842d70fc12;hb=ff5df25d28e88a15edee6d72e29c54fe105672e3
examples of how it is used are here - see test_fpmul_pipe_16.py:
https://git.libre-riscv.org/?p=ieee754fpu.git;a=tree;f=src/ieee754/fpmul/test;h=971428ac53a53196f68e7fd9f5a4f36ae6d32e9d;hb=ff5df25d28e88a15edee6d72e29c54fe105672e3
they're dead-simple, at this level. note that sfpy.Float16 and
operator.mul are passed in to the test function: that's the two key
parameters that are all that is needed (that and "width") to verify the
RTL. we're testing FPMUL, therefore we pass in operator.mul. we're
testing FP16, therefore we pass in sfpy.Float16. duh :)
the key function run_pipe_fp yields the unit test cases to cover a full
random range of specialist combinations (corner cases) that are highly
likely to fail, and, only after covering those, full arbitrary random
numbers are generated.
i can strongly recommend developing *generic* RTL that is *FULLY
PARAMETERISEABLE*, and testing FP16 *FIRST*. the reason is really simple:
FP16, by way of being much smaller bitwidths for both the exponent and
mantissa, results in far better coverage of corner cases, which, in FP32
and FP64 are simply too low probability of occurring through pseudo-random
monte-carlo testing.
however if the RTL is fully parameterisable, guess what? when it comes to
FP32 and FP64, you already tested the corner-cases of the exact same code
that generated FP16, so the probability of correctness may be deemed much
higher.
later, we will add in formal mathematical proofs, using symbiyosys. this
as an entirely separate project. we do not really trust random testing,
not even on corner-cases.
if anyone has verilog FP RTL that they need testing, and would like to use
the above unit test infrastructure, that's dead easy: investigate cocotb.
cocotb is a python wrapper around icarus verilog, and is extremely nifty.
cocotb compiles up VERILOG and inserts instrumentation into the datastream
(at the dut level) which allows it to set and read VERILOG parameters from
PYTHON.
the cocotb unit test at which i went "holy cow that's awesome" was one
which used python's PIL (imaging library) to decode a JPEG... and then
compared it directly against the output from a libre licensed verilog JPEG
decoder. all from python.
jacob points out however that because sfpy does not have an FRSQRT
function, we cannot use it. therefore we will need to write our own
(python-based) FRSQRT soft-emulation routine. once written, it gets called
exactly like this:
https://git.libre-riscv.org/?p=ieee754fpu.git;a=blob;f=src/ieee754/fcvt/test/test_fcvt_pipe.py;h=ee5ed3a5b25f38e9dc5005a440f5db7b5a72373f;hb=ff5df25d28e88a15edee6d72e29c54fe105672e3#l9
everything else that we need, softfloat-3 has it, and therefore (with the
exception of RISC-V tininess bindings) the sfpy python bindings also has
everything we need.
this gives us the confidence that, by testing against a WELL TESTED
floating-point emulation library, we have similar confidence in the
correctness of the libre risc-v CPU/GPU IEEE754 FPU.
using bigfloat to perform the reciprocal-square-root in a much higher
precision will cover the requirement to provide accurate FPSQRT. however
the corner-cases (at the extreme limits of the exponent, and when the
mantissa's MSB is zero) are going to be a bundle of fun.
the issue is: as we will be running an UNTESTED (unproven) soft-emulation
against an UNTESTED (unproven) hardware simulation, we have zero confidence
in either. exactly how to deal with this will be the subject of intensive
further investigation.
l.
>
More information about the libre-riscv-dev
mailing list