[libre-riscv-dev] [Bug 257] Implement demo Load/Store queueing algorithm

Sat Mar 21 04:12:53 GMT 2020

http://bugs.libre-riscv.org/show_bug.cgi?id=257

--- Comment #11 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #10)
> I've been thinking that we would have a cache with a cache line size of 64
> bytes (or 32 if we really have to).

eek.  ok.  right.  64 / 4 = 16 that gives 4 stripes to 1R1W Memory inside the
cache. that way we do not need multi ported SRAM in the cache.

if we have byte enable lines the 16 bit mask can be passed to each byte.

> we want the cache line size to be the same for all our caches since it
> greatly simplifies cache to cache transfers.

ok

> So, for a 32kB L1 cache, it would have 512 lines.

ye gods.

> If we had it be 8-way set
> associative, then there would be 64 sets and each set would have 8 lines.

4 way makes more sense (a little easier) as the byte enable mask is 16 bits.

however 8 way does as well because each 16 byte enable mask is split in two.

> Having a L1 cache much smaller than 16kB would be quite detrimental due to
> the excessive cache latency and lack of bandwidth to the L2.

yes.

> I'm going to be unavailable till at least Sat night, then will work on
> publishing v0.2 of algebraics (updating transitive dependencies for
> simple-soft-float due to PyO3 v0.9 being released) then work on this.

ok.

the earlier idea of a tiny pre L1 cache may on reflection be unnecessary.

i drew out a diagram a couple days ago, it basically involved:

* a PriorityPicker to find the first entry in the AddressMatrix that is "live"

* that then says which entry is to be used to broadcast the high address bits
(14 to 48) to all other live LD/STs

* bear in mind all "live" addresses have had to calculate a comparison on bits
4 thru 13 of the address, we also include *these* (ANDed) in the comparison

* anything that succeeds on matching bits 4 to 13 from the AddressMatcher
matrix *and* gives a hit on the Priority Picked bits 14 to 48, its 16bit "mask"
is permitted to go through an ORer to the L1 cache byte-enable on the Memory.

a refinement of that is to allow up to *four* of those to be picked, one to be
routed to each "way".

that is one of the things that i wrote the MultiPriorityPicker for.

-- 
You are receiving this mail because:
You are on the CC list for the bug.