[libre-riscv-dev] gflops

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sun Jul 28 14:29:01 BST 2019


---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Sun, Jul 28, 2019 at 1:55 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Sun, Jul 28, 2019, 05:24 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > ---
> > crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> >
> > On Sun, Jul 28, 2019 at 1:13 PM Jacob Lifshay <programmerjake at gmail.com>
> > wrote:
> > >
> > > I was calculating how many fp32 gflops our SoC would get if the div pipe
> > > supported simd and was 64-bits wide (needed to process fp64/i64/u64,
> > which
> > > I think we should with the pass-through-the-pipeline-twice scheme).
> >
> >  what's the pipeline length, there?  (in the FPU, not anywhere else).
> >
> depends on what we pick, I think a reasonable value for the pipeline where
> fp32 is once-through is 6 or 7 pipeline stages (ceil(32/3/2); extra stage
> for normalizing/denormalizing wiggle room) where we have 2 radix-8 stages
> per pipeline stage -- any more than that and I doubt we'd hit 800MHz due to
> gate delay.

 yehyeh that sounds sane.

> bear in mind that the number of reservation stations *has* to be equal
> > to or greater than the number of pipeline stages.
> >
> not actually, the pipeline would just never be fully utilized with less
> reservation stations.

 ... yes: which in turn means that the performance drops down to the
limit of the number of RS's, not the *pipeline* length.  it's one of
the oddities of using dependency matrices.

> > > if we want to save area (which I think will probably not be necessary),
>
> if the 2-stages per pipeline stage ends up killing our clock frequency, we
> could go with 1 radix-16 stage per pipeline stage (8 or 9 stages)

eek! :)  it's doable.  i'd be concerned about the power consumption of
16 ">=" comparators though.

> one potential option is to have the div pipe normally use 2 stages per
> pipeline stage but to have (boot-time configured or at least requires a
> pipeline flush to switch) muxes to insert pipeline registers between
> compute stages to allow much higher frequencies (maybe 2GHz? -- not low
> power mode).

 oo i like it.

> we would still have the same number of reservation stations,
> so the pipeline utilization wouldn't ever reach 100%, but it seems like a
> very simple addition that would eliminate the main culprit for clock rate
> limitations.

yehyeh no i totally get it.  love the idea.  can you raise an issue
about it so it's not forgotten?



More information about the libre-riscv-dev mailing list