[libre-riscv-dev] [Bug 216] LOAD STORE buffer needed

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Sun Apr 19 18:46:26 BST 2020


--- Comment #23 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---

ok done. it works like this:

* each FU (0-7) produces 2 LD/ST addresses which are broken up like this:

  addr1[0..3] addr1[4] addr1[5..11] addr1[12..48]
  addr2[0..3] addr2[4] addr2[5..11] addr2[12..48]

  where the relationship between addr1 and addr2 is:

       addr1[5..11] + 1 == addr2[5..11]

* addr1[0..3], in combination with the LD/ST len (1/2/3/4) is turned into
  a bytemap mask, 24 bits in length.  this bytemask is broken down into
  two halves:

      bytemask1, bytemask2 = map_addr(addr, LDST_len) [0..15], [16..23]

  i.e. anything that does not fit fully into bytemask1 is a "misaligned"
  LD/ST and the remainder overflows into bytemask2.

* if addr[4..11] == 0b111111111 and bytemask2 is non-zero, this indicates
  a "misaligned major page fault".

   this is a situation that we are *not* going to deal with (and it has
   been catered for in the 3.0B spec)

* all 16 FU LD/ST re-encodings of the (addr, LDST_len) are lined up in a
  table.  this table breaks down, alternating between:

  * FU # aligned     or FU # misaligned
  * addr1[5:11]      or addr2[5:11]
  * addr[12:48] for *BOTH*
  * bytemap1[0:15]   or bytemap2[0:15]
  * data1[0:15]      or data2[15]

note that addr[4] is *not* included in this because it is used to select
whether L1 cache bank #0 or #1 is to be used.

the algorithm for merging of LD/STs into *one single L1 cache line* is:

1). With a PriorityPicker find the index (row) of the first valid LD/ST request

2). For all entries after that row, compare Addr[5:11] and Addr[12:48].

3). If "match" on both, OR the byte-mask for that row onto the output.

that's it.  that's really all there is to it.

one thing that's important to note: there are only actually *eight* comparisons
of addr[12:48] needed (not 16), because the addr[12:48] is *identical* for
every *pair* of rows.

that however is still *seven* potential 36-bit CAM hits (seven lots of 36-bit
XOR gates).  which is a hell of a lot.

if we could somehow use the L1 "tag" in place of Addr[12:48], that would save a
huge amount of power.  unfortunately, every way i can think of that would get
the tag *into* L0 is either equally power-consuming, or results in multi-cycle

if we could reliably use a hash instead, i would suggest it.  however,
unfortunately, the risk of a collision is too detrimental consequences.

the "sensible" option that does not have too detrimental an effect on
performance is: reduce the number of LD/ST FUs to 6.  that would result in only
12 rows.

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-riscv-dev mailing list