[libre-riscv-dev] [isa-dev] FP reciprocal sqrt extension proposal

Fri Jul 12 07:45:37 BST 2019

http://www.acsel-lab.com/arithmetic/arith15/papers/ARITH15_Takagi.pdf

http://bugs.libre-riscv.org/show_bug.cgi?id=44

Some context, above, apologies using a phone to type, quite awkward to keep thread replies.

Vulkan's accuracy requirements are extreme. Error is only allowed in the last 2 bits of mantissa.

3D GPU requirements are also extreme. One DIV and one ISQRT per pixel, no compromises allowed. This for normalisation, typically 1/(x^2 + y^2 + z^2).

Also there are power requirements to meet.

This eliminates Newton Raphson and other iterative methods as there is no guaranteed completion time, plus, if providing enough engines to do so in a readonable timeframe (higher radix) the number of multipliers and in particular their increased size will kill all chance of meeting the power budget.

We therefore had to research pipelined designs ONLY, and Jacob found the above paper. It uses On the Fly conversion as well as redundant carry save format, between pipeline stages this saves hugely on gate count.

The fascinating bit is that the OTFC outputs BOTH sqrt AND isqrt from the SAME hardware. This because it needs the partial results from each to make decisions on what to do within each stage.

Unfortunately the paper is extremely obtuse, like many academic papers, and there is no verilog source. Sigh.

So in the meantime we go with a simpler design, at least we have something, and Jacob has worked out that there are adjustable magic constants so that DIV, SQRT and ISQRT can be covered by at least the same algorithm if not the actual same hardware, with very little extra gate count.

Summary:

1. For 3D we absolutely need isqrt, this is going to go ahead.

2. Lookup tables and Newton Raphson are off the table for us.

3. There exist algorithms that give ISQRT "for free".

4. Love the idea, Guy, of the add, however we may need more than 2 operands, 3 adds would be more useful. Perhaps a separate opcode?

5. We have no problem with a spec requiring less accuracy, however it is something that other implementors may come to regret, particularly when it comes to testing. We use softfloat python bindings on DIV SQRT MUL ADD, perform direct comparisons, and it works extremely well.

L.