[libre-riscv-dev] TLB key for CAM

Wed Apr 10 08:23:49 BST 2019

> hi daniel,

Greetings comrade.

> i saw you added the SetAssociativeCache code last week, it looks really
good.

:)

> you probably saw, i did a tidy-up on TLB, by adding a nmigen.main,
> running it, and catching some syntax errors.  where i couldn't
> identify what to do, i added "XXX TODO" into the code as a comment.

Yes there are a few sorry. I will fix those as soon as I can.

> one nice thing about using python, it's possible to use python
> functions to split code up, reduce the indentation level
> significantly, and give the function a meaningful name.  that's what i
> did in TLB.py, i split search and write_L1 into their own functions.

Neat I saw that. Looks good!

> also, could you check: i believe nmigen will set a signal to the
> "default (reset)" value if it is not explicitly set, and that the
> default is "0" (zero).

It does indeed! For combinational logic the data will reset to 0 as nothing
is driving it. For synchronous logic it will hold its value.

> so... what that would mean, in turn, is, that *all* of the code which
> explicitly sets signals to zero is not necessary (see below).
>
> of course, that *assumes* that, for example, that "cam_L1.enable" does
> actually default to zero!
>
> what do you think, better to leave the resets in explicitly, so that
> it's clearer what's going on?

It would be nice to remove those statements but I personally think that is
a bad idea considering for synchronous logic we must reset the signals
manually. Sounds like a really strong gotcha moment. Which I have a
vendetta against haha.

I will endeavor to break the logic apart more. It is cool that you can pass
the module and reset signals within methods. Though I see a little bit of
confusion stemming from modifying outside structures. Would you know how
nmigen handles return values from functions or is that unusable in the
generated code?

For the set associative cache I am leaning towards saving a few bits in
each way that records the place of the entry in a LRU "list". The entry
with the highest number would be replaced when writing to a fully occupied
set. Does that make sense? The stored number would never exceed the maximum
number of ways in a set. An example is, take a 3-way set associative cache
with 1 set.

Entry 0
Tag -> 5 Data -> 50 LRU -> 0 Active -> 1
Entry 1
Tag -> 9 Data -> 23 LRU -> 1 Active -> 1
Entry 2
Tag -> 0 Data -> 0 LRU -> 0 Active -> 0

We write (Tag -> 5 Data->36) to the set and find an inactive entry. Easy,
just write to the open entry! Now we have

Entry 0
Tag -> 5 Data -> 50 LRU -> 1 Active -> 1
Entry 1
Tag -> 9 Data -> 23 LRU -> 2 Active -> 1
Entry 2
Tag -> 5 Data -> 36 LRU -> 0 Active -> 1

Notice that Entry 0 had its LRU values incremented while Entry 1 stayed
constant. Now look for Tag 5!

Entry 0
Tag -> 5 Data -> 50 LRU -> 0 Active -> 1
Entry 1
Tag -> 9 Data -> 23 LRU -> 2 Active -> 1
Entry 2
Tag -> 5 Data -> 36 LRU -> 1 Active -> 1

Again notice that Entry 2 has incremented and Entry 1 is constant, Now look
for Tag 5 again.

Entry 0
Tag -> 5 Data -> 50 LRU -> 0 Active -> 1
Entry 1
Tag -> 9 Data -> 23 LRU -> 2 Active -> 1
Entry 2
Tag -> 5 Data -> 36 LRU -> 2 Active -> 1

This is the flaw with the concept. After enough reads on the same element
this LRU smashed together elements even though Entry 1 has never been read
from. There are a variety of things we can do like remove entries when this
happens or allow this conflict and remove the entry with the lowest index.
If you think this is a good idea I will continue on this path. My support
for this idea is that it is cheap to implement  as it costs (number of way
bits) * number of ways and +1 cycle when reading (when a hit occurs).

For the write it will still be messy as with any LRU, but this concept
should really just have a series of comparators to find the largest LRU
value and its associated way index for replacement.

What do you think? Is it reasonable? I have been debating how to do this
for a long time with minimal cost and I found this to be a pretty good
trade-off. I have also been storing the bits in the data rather than having
a separate register for the associated data. I believe this is correct but
if you disagree again please let me know. I think it is correct as storing
the data in a register file would be expensive and we already have the
memory there right?

If you do not like this idea I could try this one here.
<https://stackoverflow.com/questions/23448528/how-is-an-lru-cache-implemented-in-a-cpu>
However, that is expensive for the official version with the 24 states and
truth be told I have no idea where the 24 states come from. To be fair its
pretty late right now.

There was another suggestion of a binary tree but again I need to read it
not at midnight,

Hope to hear from you soon,

Daniel B.