[libre-riscv-dev] GPU design
programmerjake at gmail.com
Fri Dec 7 12:52:15 GMT 2018
On Fri, Dec 7, 2018, 04:48 lkcl <lkcl at libre-riscv.org wrote:
> On Fri, Dec 7, 2018 at 12:33 PM Jacob Lifshay <programmerjake at gmail.com>
> > On Fri, Dec 7, 2018, 03:37 lkcl <lkcl at libre-riscv.org wrote:
> > I think sharing between pairs of cores will still work since with a
> > pipelined divider, you can do 1 divide per clock. As some perspective, a
> > quad-core haswell using avx instructions can do 2.29 (4 cores * 8 lanes /
> > 14 cycles) fp32 divisions per clock and our quad-core GPU with a
> > divider per pair of cores can do 2 divisions per clock.
> haswell avx isn't targetted at GPU workloads (but does pretty well at
> video decoding), appreciate the insight.
> > Note that having the rv base integer and fp registers be part of the same
> > register file like I had suggested before allows us to save 2 clock
> > with the fast sqrt algorithm since you can use the SV rename table to
> > an integer register and a fp register renamed to the same underlying
> > register removing the need to move between int and fp registers.
> i think, with ROB#s, MV could hypothetically be implemented as
> just... changing the dest target register number (and type, from
> int/float). maybe. will need to be thought through properly.
Yeah, but not needing the mv instruction at all is better.
More information about the libre-riscv-dev