[libre-riscv-dev] Vulkanizing

Wed Feb 19 06:57:46 GMT 2020

On Wednesday, February 19, 2020, Immanuel, Yehowshua U <
yimmanuel3 at gatech.edu> wrote:

>
> > The current plan is to have the main (mul-add) SIMD ALU be 128-bits wide
> > per-core. That is 8 flops/core/cycle of fp32, 16 for fp16, and, depending
> > on how we implement it, either 2 or 4 flops/core/cycle of fp64. There are
> > also other ALUs for div (int and fp), sqrt, rsqrt, and other special
> > functions, so those will help increase performance too.
>
> Yes this make sense.
>
> OK, I’ve got a rough idea of what we’re doing now.
> We have 4 CPU cores right?

in the quad core target yes.

in the 180nm one no.

>
> So that’s 4 128-bit

8 separate 64 bit ALUs. 2 per core.  1 on odd regs, the other on even.

>  wide ALU.

yes.

Gosh, that’s a lot of gates,

hell yes. 30% of GPUs is the FP ALUs

>  and dynamic partitioning - cool - but a little tricky…

actually very simple.  it's SIMD at thr ALUs.  the decode phase will be
Vector Frontend, and convert to SIMD backend with predicate masks.

elements at the end of a loop are masked out.

however because the Dependency Matrices are on *32 bit* not 64 bit
boundaries,  we do not waste an entire 32 bit part of an ALU using "masks".

however for 16 and 8 bit yes at the end of a non power 2 vector some parts
of the SIMD ALU will run empty some of the time.

i can live with that.

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68