[libre-riscv-dev] [Bug 216] LOAD STORE buffer needed

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Thu Mar 12 17:29:04 GMT 2020


http://bugs.libre-riscv.org/show_bug.cgi?id=216

--- Comment #6 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
ok i think i have a data structure that will help.  it's a (very small) CAM,
with extra "things".

also it critically depends upon a pre-analysis of the LD/ST operation, breaking
it down from this form, where unary is defined as (1<<N):

FunctionUnit# Addr Length

into:

unary(FU#){0}  Addr[4:]   unary(Addr[0:3]     )  (start=0, mask=000bbbb)
unary(FU#){1}  Addr[4:]+1 unary(Addr[0:3]&mask)  (start=M, mask=bbb0000)

when a write is misaligned across a cache line, it results in the *second*
operation (unary(FU#){1}) being activated, where there would be two separate
and distinct LOADs / STOREs coming from *one* FunctionUnit.

the (start,mask) is so as to be able to shift and mask the data as it comes in
(or out) of the register.  this because when it's split into two, obviously you
have to know which portion of each split covers which part of the data
register.


let us assume that there are 4 LOAD/STORE Function Units.

there is a table (a CAM) with **EIGHT** rows, with an *unary* row-matching
by:

unary(FU#0){0}
unary(FU#0){1}
unary(FU#1){0}
unary(FU#1){1}
unary(FU#2){0}
unary(FU#2){1}
unary(FU#3){0}
unary(FU#3){1}

and in each column there is:

* MSBs of address - address[4:]
* *bitmask* of bytes covered by address[0:3]
* (up to) 128 bit data to be LD/ST'd

basically it's a very very small L0 cache, and because it's so small we can do
row-addressing in *unary*.

the key to resolving misalignment is that initial break-down of the "original"
LD/ST into a *PAIR* (plus a bitmask)

the bit that is a bit... awful is: there will be *four* Load-Store Units
creating and contending for up to *EIGHT* simultaneous reads/writes to this
CAM, worst-case (every single one of the operations crosses a cache-line
boundary)

that's far too much.

i think it is more reasonable to limit that to only four, however even a
four-way port-contention system is still one hell of a lot.

writing to this store/buffer/CAM: with four ports (four ways in CAM
terminology) any one LOAD/STORE will be able to write to two separate addresses
in the CAM

AND

up to *four* LOAD/STORE Units will be able to read/write to the *same* address
(same row).

the Memory itself will need individual byte-level enable lines.

then, the "flush" from this small CAM will be able to read/write entire lines
out to the L1 Cache.  each line will kiinda be like one single-entry FIFO.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list