[libre-riscv-dev] gflops

Jacob Lifshay programmerjake at gmail.com
Sun Jul 28 23:33:16 BST 2019


On Sun, Jul 28, 2019, 06:29 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> On Sun, Jul 28, 2019 at 1:55 PM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > On Sun, Jul 28, 2019, 05:24 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > wrote:
> >
> > > ---
> > > crowd-funded eco-conscious hardware:
> https://www.crowdsupply.com/eoma68
> > >
> > > On Sun, Jul 28, 2019 at 1:13 PM Jacob Lifshay <
> programmerjake at gmail.com>
> > > wrote:
> > > >
> > > > I was calculating how many fp32 gflops our SoC would get if the div
> pipe
> > > > supported simd and was 64-bits wide (needed to process fp64/i64/u64,
> > > which
> > > > I think we should with the pass-through-the-pipeline-twice scheme).
> > >
> > >  what's the pipeline length, there?  (in the FPU, not anywhere else).
> > >
> > depends on what we pick, I think a reasonable value for the pipeline
> where
> > fp32 is once-through is 6 or 7 pipeline stages (ceil(32/3/2); extra stage
> > for normalizing/denormalizing wiggle room) where we have 2 radix-8 stages
> > per pipeline stage -- any more than that and I doubt we'd hit 800MHz due
> to
> > gate delay.
>
>  yehyeh that sounds sane.
>
> > bear in mind that the number of reservation stations *has* to be equal
> > > to or greater than the number of pipeline stages.
> > >
> > not actually, the pipeline would just never be fully utilized with less
> > reservation stations.
>
>  ... yes: which in turn means that the performance drops down to the
> limit of the number of RS's, not the *pipeline* length.  it's one of
> the oddities of using dependency matrices.
>
> > > > if we want to save area (which I think will probably not be
> necessary),
> >
> > if the 2-stages per pipeline stage ends up killing our clock frequency,
> we
> > could go with 1 radix-16 stage per pipeline stage (8 or 9 stages)
>
> eek! :)  it's doable.  i'd be concerned about the power consumption of
> 16 ">=" comparators though.
>
the adders would use several times the power of the comparators, since
there are several adders per comparator and a comparator is about
equivalent to an adder since it can be implemented using an adder.

notably, the area of 1 radix-16 stage is similar to 2 radix-8 stages but
the radix-16 stage's gate delay is similar to 1 radix-8 stage

>
> > one potential option is to have the div pipe normally use 2 stages per
> > pipeline stage but to have (boot-time configured or at least requires a
> > pipeline flush to switch) muxes to insert pipeline registers between
> > compute stages to allow much higher frequencies (maybe 2GHz? -- not low
> > power mode).
>
>  oo i like it.
>
> > we would still have the same number of reservation stations,
> > so the pipeline utilization wouldn't ever reach 100%, but it seems like a
> > very simple addition that would eliminate the main culprit for clock rate
> > limitations.
>
> yehyeh no i totally get it.  love the idea.  can you raise an issue
> about it so it's not forgotten?
>
bug report submitted

>
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>


More information about the libre-riscv-dev mailing list