[libre-riscv-dev] TLB key for CAM

Wed Apr 10 08:30:26 BST 2019

one alternative is using random replacement in the TLB, which from what I
recall has similar performance to LRU for large numbers of ways.

On Wed, Apr 10, 2019, 00:24 Daniel Benusovich <flyingmonkeys1996 at gmail.com>
wrote:

> > hi daniel,
>
> Greetings comrade.
>
> > i saw you added the SetAssociativeCache code last week, it looks really
> good.
>
> :)
>
> > you probably saw, i did a tidy-up on TLB, by adding a nmigen.main,
> > running it, and catching some syntax errors.  where i couldn't
> > identify what to do, i added "XXX TODO" into the code as a comment.
>
> Yes there are a few sorry. I will fix those as soon as I can.
>
> > one nice thing about using python, it's possible to use python
> > functions to split code up, reduce the indentation level
> > significantly, and give the function a meaningful name.  that's what i
> > did in TLB.py, i split search and write_L1 into their own functions.
>
> Neat I saw that. Looks good!
>
> > also, could you check: i believe nmigen will set a signal to the
> > "default (reset)" value if it is not explicitly set, and that the
> > default is "0" (zero).
>
> It does indeed! For combinational logic the data will reset to 0 as nothing
> is driving it. For synchronous logic it will hold its value.
>
> > so... what that would mean, in turn, is, that *all* of the code which
> > explicitly sets signals to zero is not necessary (see below).
> >
> > of course, that *assumes* that, for example, that "cam_L1.enable" does
> > actually default to zero!
> >
> > what do you think, better to leave the resets in explicitly, so that
> > it's clearer what's going on?
>
> It would be nice to remove those statements but I personally think that is
> a bad idea considering for synchronous logic we must reset the signals
> manually. Sounds like a really strong gotcha moment. Which I have a
> vendetta against haha.
>
> I will endeavor to break the logic apart more. It is cool that you can pass
> the module and reset signals within methods. Though I see a little bit of
> confusion stemming from modifying outside structures. Would you know how
> nmigen handles return values from functions or is that unusable in the
> generated code?
>
> For the set associative cache I am leaning towards saving a few bits in
> each way that records the place of the entry in a LRU "list". The entry
> with the highest number would be replaced when writing to a fully occupied
> set. Does that make sense? The stored number would never exceed the maximum
> number of ways in a set. An example is, take a 3-way set associative cache
> with 1 set.
>
> Entry 0
> Tag -> 5 Data -> 50 LRU -> 0 Active -> 1
> Entry 1
> Tag -> 9 Data -> 23 LRU -> 1 Active -> 1
> Entry 2
> Tag -> 0 Data -> 0 LRU -> 0 Active -> 0
>
> We write (Tag -> 5 Data->36) to the set and find an inactive entry. Easy,
> just write to the open entry! Now we have
>
> Entry 0
> Tag -> 5 Data -> 50 LRU -> 1 Active -> 1
> Entry 1
> Tag -> 9 Data -> 23 LRU -> 2 Active -> 1
> Entry 2
> Tag -> 5 Data -> 36 LRU -> 0 Active -> 1
>
> Notice that Entry 0 had its LRU values incremented while Entry 1 stayed
> constant. Now look for Tag 5!
>
> Entry 0
> Tag -> 5 Data -> 50 LRU -> 0 Active -> 1
> Entry 1
> Tag -> 9 Data -> 23 LRU -> 2 Active -> 1
> Entry 2
> Tag -> 5 Data -> 36 LRU -> 1 Active -> 1
>
> Again notice that Entry 2 has incremented and Entry 1 is constant, Now look
> for Tag 5 again.
>
> Entry 0
> Tag -> 5 Data -> 50 LRU -> 0 Active -> 1
> Entry 1
> Tag -> 9 Data -> 23 LRU -> 2 Active -> 1
> Entry 2
> Tag -> 5 Data -> 36 LRU -> 2 Active -> 1
>
> This is the flaw with the concept. After enough reads on the same element
> this LRU smashed together elements even though Entry 1 has never been read
> from. There are a variety of things we can do like remove entries when this
> happens or allow this conflict and remove the entry with the lowest index.
> If you think this is a good idea I will continue on this path. My support
> for this idea is that it is cheap to implement  as it costs (number of way
> bits) * number of ways and +1 cycle when reading (when a hit occurs).
>
> For the write it will still be messy as with any LRU, but this concept
> should really just have a series of comparators to find the largest LRU
> value and its associated way index for replacement.
>
> What do you think? Is it reasonable? I have been debating how to do this
> for a long time with minimal cost and I found this to be a pretty good
> trade-off. I have also been storing the bits in the data rather than having
> a separate register for the associated data. I believe this is correct but
> if you disagree again please let me know. I think it is correct as storing
> the data in a register file would be expensive and we already have the
> memory there right?
>
> If you do not like this idea I could try this one here.
> <
> https://stackoverflow.com/questions/23448528/how-is-an-lru-cache-implemented-in-a-cpu
> >
> However, that is expensive for the official version with the 24 states and
> truth be told I have no idea where the 24 states come from. To be fair its
> pretty late right now.
>
> There was another suggestion of a binary tree but again I need to read it
> not at midnight,
>
> Hope to hear from you soon,
>
> Daniel B.
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>