[libre-riscv-dev] Minerva L1 Cache

Mon Jun 15 05:30:37 BST 2020

On Monday, June 15, 2020, Yehowshua <yimmanuel3 at gatech.edu> wrote:

> And more generally, if I understand things correctly:
> Our memory architecture is the following?
>
> cpu -> MMU -> L1 Cache -> main memory?

the diagram i drew, which took several weeks to design and review on here
and comp.arch, which i have referred you to a couple of times but have
noted each time, puzzlingly, did not result in any engagement or questions,
contains the layout.

i wondered if perhaps this is because this is a high traffic list and you
have been so busy on other important things.

here is the page again:

https://libre-soc.org/3d_gpu/architecture/memory_and_cache/

there are other locations and discussions however this diagram is key.

the multiple LDST CompUnits can produce up to *SIXTEEN* 64 bit LDST
requests.

these are "funneled", grouped, and directed to *separate* odd/even L1
Caches, each being 128 bit wide.  the split decision uses bit 4 of the
address.

the TLB and MMU i have not been able to get to because there is so much
else to do, first.

without that funneling (merging) we simply could not meet a GPU workload.

All LDST operations (all 16 of them) have the potential to be merged in a
single cycle.  in reality only 8 are likely to be merged.

it is important therefore to appreciate that this architectural strategy
has been designed and evaluated over a 2 year period to give "reasonable"
64 bit scalar performance, yet 2 to 4x that performance if certain criteria
are met such as if FP32 sequential (element atrided) vector workloads are
issued.

this eliminates many of the N to N crossbars that would obliterate any
chance of us being able to claim low power consumption as well as make the
routing and layout absolute hell.

it really is a hybrid VPUGPUCPU not a "CPU pretending to be a GPU" or the
other way round.

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68