[libre-riscv-dev] GPU design

Mon Dec 3 23:02:41 GMT 2018

I created a simple diagram of what I think would work for the ALUs and
register file for the GPU design. The diagram doesn't include forwarding or
pipeline registers.

https://salsa.debian.org/Kazan-team/kazan/blob/e4b516e29469e26146e717e0ef4b552efdac694b/docs/ALU%20lanes.svg

I noticed that if we use register renaming, we can allocate the output
registers of each of the 4 lanes in such a way that the register file can
be split into 4 parts with each part only being written by it's associated
lane, meaning that we can get away with only a few write ports, 1 for each
supported instruction latency. I'm planning on supporting single-cycle
instructions (integer add, sub, xor, etc.), 3-4 cycle instructions (fadd,
fmul, fmadd, load, etc.) and for longer instructions (fdiv, integer div,
etc.) just stall the rest of the processor when the instructions finish in
order to create a free slot to write, though we could add another write
port if long instructions are too slow.

Note that there are 0xC0 hardware registers because we need 0x80 for the
architecturally visible registers, and the other 0x40 are used for
renaming. 0x40 spare registers should be enough because that's enough for 4
16-cycle instructions issued per clock.

I'm planning on adding additional forwarding to skip the extra cycle needed
to read/write the register file.

Note that the GPU probably won't be a 4-wide-issue processor, those are
just the per-element operations generated from single vectorized operations.

Jacob Lifshay