[libre-riscv-dev] gflops

Jacob Lifshay programmerjake at gmail.com
Sun Jul 28 13:13:10 BST 2019

I was calculating how many fp32 gflops our SoC would get if the div pipe
supported simd and was 64-bits wide (needed to process fp64/i64/u64, which
I think we should with the pass-through-the-pipeline-twice scheme).

if a frsqrt is counted as 2 flops (div + sqrt, like fma is mul + add), then
each core would get 12 flops/clock (2*2 for div pipe, 4*2 for mul add
pipe), giving 60 gflops(!) at a overclock of 1.25GHz and 38.4gflops at

fp16 would give 24 flops/clock/core (76.8gflops at 800MHz; 120gflops at
1.25GHz) and fp64 would give 5 flops/clock/core (16gflops at 800MHz;
25gflops at 1.25GHz).

I think the gpu may end up with higher performance than initially planned
(assuming the memory system keeps up), which is good in my book.

if we want to save area (which I think will probably not be necessary), we
could shrink the div pipe stage count by doubling the number of times fp32
and fp64 need to go through the pipeline to 2 and 4 times respectively:
fp16: 24flops/clock/core -- 76.8gflops at 800MHz
fp32: 10flops/clock/core -- 32gflops at 800MHz
fp64: 4.5flops/clock/core -- 14.4gflops at 800MHz

Jacob Lifshay

More information about the libre-riscv-dev mailing list