[libre-riscv-dev] TLB Initial Proposal

Tue Jan 22 09:07:24 GMT 2019

On Mon, Jan 21, 2019 at 10:53 PM Daniel Benusovich <
flyingmonkeys1996 at gmail.com> wrote:

> >
> > We were specifically looking for ways to not need large CAMs since they
> are
> > power-hungry when designing the instruction scheduling logic, so it may
> be
> > a good idea to have a smaller L1 TLB and a larger, slower, more
> > power-efficient, L2 TLB. I would have the L1 be 4-32 entries and the L2
> can
> > be 32-128 as long as the L2 cam isn't being activated every clock cycle.
> We
> > can also share the L2 between the instruction and data caches
> >
> > Sounds great. Would the two levels of caching be separate types of memory
> entirely? For example, the first level is a CAM and the second level is a
> SRAM? What would determine the size of the caches? Would the cache size
> value be created through testing to see what gives the minimum number of
> misses vs power used?
>
> The spec supports up to 56 bits, I would like to implement at least 40 for
> > the off-chip tilelink-over-ethernet. The rest of the internals probably
> > only need about 35.
> >
> >
> Neat. So we would need to hold at least 56 bits per entry? Or is it 56 +
> 40 + 35 bits per entry? What is the spec? Is that relating to SimpleV or
> RISCV specifications? Also what is referenced by internals? Is it possibly
> the actual Physical Page Numbers (PPN) stored in the table entries? I see
> in the RISCV manual that for Sv48 implementation a 64 bit table entry is
> used for a 56 bit physical and virtual address. (Section 4.5.1 Figure 4.21
> pg. 64 V1.10)
>
The spec I was referring to is the RISCV manual. If all TLB misses are
handled in software, we need:
- the portion of the Physical Page Number that we implement (PPN)
- the Virtual Page Number (VPN) along with a page size indicator
(4KiB/2MiB/1GiB/0.5TiB)
- a bit indicating readability (R)
- a bit indicating writablility (W)
- a bit indicating executability (X)
- a user/supervisor bit (U)
- a bit indicating if the page is global or local to an ASID (G)
- the ASID (ASID)

Presence is indicated by RWX != 000
The CAM should indicate a match when VPN matches and either G is set or the
ASID matches.

> So it seems that having a software controlled TLB with 2 levels of caching
> is the current course. Since misses will be handled in software what does
> need to be done in hardware? For instance, will fetching the translations
> and placing them into the TLB be a hardware or software task? If it is a
> software task then the TLB must accept certain instructions to do what it
> needs to or how does it work?
>
We need the address bus before the TLB to be at least as wide as the
physical address bus.

The hardware detects when either no valid translation was found or when the
access would not be allowed (trying to access a user page from the
supervisor without the SUM bit set, for example), and causes a machine mode
interrupt.
The software will fetch then translations using the Bare addressing mode,
setting accessed and dirty (if writing) bits along the way, stopping when
it finds the leaf page table entry or when there's an invalid access. If
there's an invalid access, it will trigger a page-fault interrupt.
Otherwise, it will use special instructions or CSR registers to modify the
TLB state, adding the new entry and (possibly) evicting old entries. The
software then returns to the interrupted code.

Note that we can change this algorithm, I just wrote it how I would do it.

Hope this helps,
Jacob Lifshay