[libre-riscv-dev] FP transcendentals (trigonometry, root/exp/log) proposal
luke.leighton at gmail.com
Fri Aug 9 07:45:05 BST 2019
I must apologise, I saw somewhere, Mitch I think it was, posted that CORDIC latency is quite high (single bit per iteration) so is less likely to be used in high performance designs.
Couple of things about that:
* In the [new] 3D Embedded Platform, speed and performance are nonessential: even accuracy is nonessential. Cost savings on SDKs, power, etc are the higher priority.
CORDIC, for these profiles, is perfect, because of the huge number of operands it covers.
* For the Libre RISCV SoC, chances are high that we will use it (at least for a first revision), and will do so by treating each iteration as a combinatorial block.
Several of those blocks will be chained together *per pipeline*, which will increase gate latency and that is perfectly acceptable as the clock rate target is only 800mhz (not 4ghz).
This trick is one that we have deployed in the FPDIV/SQRT/RSQRT pipeline, using high radix stages as well, to get the pipeline length down even further in that instance.
Whether CORDIC algorithm enhancements exist that will allow us to do more than one bit at a time? Haven't looked yet.
Given that CORDIC is at heart just a simple add and compare, I really do not expect the chaining of multiple iterations as combinatorial blocks to have that big an adverse effect (not on an 800mhz target).
With the mantissa being 23 bit, three chains would easily get us to a 9-10 long pipeline for FP32. 4 chains would give a 7-8 stage pipeline.
This is a really good return on implementation time investment for such a huge number of operations being covered by such a ridiculously simple and elegant algorithm.
More information about the libre-riscv-dev