[libre-riscv-dev] TLB Initial Proposal

Mon Jan 21 11:13:04 GMT 2019

On Sun, Jan 20, 2019, 22:20 Daniel Benusovich <flyingmonkeys1996 at gmail.com
wrote:

> I read over a paper discussing TLBs and believe we could have 2 64-entry
> fully associative TLB caches (128 entries total) using CAMs (Content
> Addressable Memory). One cache would be used as an "active" list and the
> second as an "inactive"  list . Linux uses a "Two-List Strategy" (which is
> where I am pulling this from) in evicting cache entries.
>
We were specifically looking for ways to not need large CAMs since they are
power-hungry when designing the instruction scheduling logic, so it may be
a good idea to have a smaller L1 TLB and a larger, slower, more
power-efficient, L2 TLB. I would have the L1 be 4-32 entries and the L2 can
be 32-128 as long as the L2 cam isn't being activated every clock cycle. We
can also share the L2 between the instruction and data caches.

>
> All translations when initially called would be placed into the active
> list. Entries in the inactive list would be moved into the active list when
> hit.
>
> If the active table fills up or gets to large the head entry should be
> popped off and added to the inactive list. If both (active and inactive)
> lists are full then: pop the head entry from the inactive list into the
> ether, pop the head entry from the active list into the inactive table, and
> place the new translation into the active list.
>
> When lulls in requests occur and the inactive list exceed a given
> threshold, popping off should occur to ensure that both lists never fully
> fill up.
>
> The benefit to this is the need to only maintain a tail and head pointer
> for both lists. This would use 7 bits per pointer and total 28 bits for the
> 4 pointers which is nice.
>
> Alternatively, A single 128 or 64 entry fully associative TLB is also
> possible using a more standard LRU.
>
> This logic would have to be implemented in software not hardware correct?
> Most of the design papers I read have the OS perform the logic for cache
> misses and controlling what goes where. This is a RISC styling. Older
> designs had all logic controlled in the hardware. Which is a CISC styling.
> I am not sure if this changes for a mobile application, as all what I read
> was quite general purpose.
>
You can do it either way. The state machine needed for walking page tables
is quite simple and (hopefully) rarely activated.

>
> A few questions appear from this:
>
> 1. What page size we will be supporting?

4KiB

> 2. What is the maximum physical memory we will be supporting?
>
The spec supports up to 56 bits, I would like to implement at least 40 for
the off-chip tilelink-over-ethernet. The rest of the internals probably
only need about 35.

> 3. Will the operating system on the chip be reserving any part of the
> virtual memory space as kernel memory space?
>
like Amd64, the virtual address space is sign-extended, and the kernel
reserves negative addresses.

>From what I understand, we are using RISC-V's Supervisor Spec so we don't
have to rewrite that part of Linux. We're supporting Sv39 since that's the
simplest virtual memory mode for 64-bit cpus. That gives us 512GiB of
virtual address space.

If it's not too much work, I'd like to support Sv48 as well, since that
will be useful for mapping very large files and connecting via shared
memory to bigger servers. If we are having sw handle TLB misses, all we
have to do is make the virtual address bus bigger.

>
> Any feedback and or guidance would be much appreciated!
>
> Hope you are having a good one and possibly had a chuckle reading this,
>
> Daniel B.
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>