[libre-riscv-dev] Minerva L1 Cache
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Mon Jun 15 05:30:37 BST 2020
On Monday, June 15, 2020, Yehowshua <yimmanuel3 at gatech.edu> wrote:
> And more generally, if I understand things correctly:
> Our memory architecture is the following?
> cpu -> MMU -> L1 Cache -> main memory?
the diagram i drew, which took several weeks to design and review on here
and comp.arch, which i have referred you to a couple of times but have
noted each time, puzzlingly, did not result in any engagement or questions,
contains the layout.
i wondered if perhaps this is because this is a high traffic list and you
have been so busy on other important things.
here is the page again:
there are other locations and discussions however this diagram is key.
the multiple LDST CompUnits can produce up to *SIXTEEN* 64 bit LDST
these are "funneled", grouped, and directed to *separate* odd/even L1
Caches, each being 128 bit wide. the split decision uses bit 4 of the
the TLB and MMU i have not been able to get to because there is so much
else to do, first.
without that funneling (merging) we simply could not meet a GPU workload.
All LDST operations (all 16 of them) have the potential to be merged in a
single cycle. in reality only 8 are likely to be merged.
it is important therefore to appreciate that this architectural strategy
has been designed and evaluated over a 2 year period to give "reasonable"
64 bit scalar performance, yet 2 to 4x that performance if certain criteria
are met such as if FP32 sequential (element atrided) vector workloads are
this eliminates many of the N to N crossbars that would obliterate any
chance of us being able to claim low power consumption as well as make the
routing and layout absolute hell.
it really is a hybrid VPUGPUCPU not a "CPU pretending to be a GPU" or the
other way round.
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev