Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Feb 19 06:57:46 GMT 2020
On Wednesday, February 19, 2020, Immanuel, Yehowshua U <
yimmanuel3 at gatech.edu> wrote:
> > The current plan is to have the main (mul-add) SIMD ALU be 128-bits wide
> > per-core. That is 8 flops/core/cycle of fp32, 16 for fp16, and, depending
> > on how we implement it, either 2 or 4 flops/core/cycle of fp64. There are
> > also other ALUs for div (int and fp), sqrt, rsqrt, and other special
> > functions, so those will help increase performance too.
> Yes this make sense.
> OK, I’ve got a rough idea of what we’re doing now.
> We have 4 CPU cores right?
in the quad core target yes.
in the 180nm one no.
> So that’s 4 128-bit
8 separate 64 bit ALUs. 2 per core. 1 on odd regs, the other on even.
> wide ALU.
Gosh, that’s a lot of gates,
hell yes. 30% of GPUs is the FP ALUs
> and dynamic partitioning - cool - but a little tricky…
actually very simple. it's SIMD at thr ALUs. the decode phase will be
Vector Frontend, and convert to SIMD backend with predicate masks.
elements at the end of a loop are masked out.
however because the Dependency Matrices are on *32 bit* not 64 bit
boundaries, we do not waste an entire 32 bit part of an ALU using "masks".
however for 16 and 8 bit yes at the end of a non power 2 vector some parts
of the SIMD ALU will run empty some of the time.
i can live with that.
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev