[libre-riscv-dev] GPU design

Fri Dec 7 12:52:15 GMT 2018

On Fri, Dec 7, 2018, 04:48 lkcl <lkcl at libre-riscv.org wrote:

> On Fri, Dec 7, 2018 at 12:33 PM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > On Fri, Dec 7, 2018, 03:37 lkcl <lkcl at libre-riscv.org wrote:
>
> > I think sharing between pairs of cores will still work since with a
> > pipelined divider, you can do 1 divide per clock. As some perspective, a
> > quad-core haswell using avx instructions can do 2.29 (4 cores * 8 lanes /
> > 14 cycles) fp32 divisions per clock and our quad-core GPU with a
> pipelined
> > divider per pair of cores can do 2 divisions per clock.
>
>  haswell avx isn't targetted at GPU workloads (but does pretty well at
> video decoding), appreciate the insight.
>
> > Note that having the rv base integer and fp registers be part of the same
> > register file like I had suggested before allows us to save 2 clock
> cycles
> > with the fast sqrt algorithm since you can use the SV rename table to
> have
> > an integer register and a fp register renamed to the same underlying
> > register removing the need to move between int and fp registers.
>
>  i think, with ROB#s, MV could hypothetically be implemented as
> just... changing the dest target register number (and type, from
> int/float).  maybe.  will need to be thought through properly.
>
Yeah, but not needing the mv instruction at all is better.

>