[libre-riscv-dev] MMU + TLB idea

Fri May 8 01:40:17 BST 2020

On Friday, May 8, 2020, Michael Nolan <mtnolan2640 at gmail.com> wrote:

> I remember some time ago working on a powerpc core that (I think) had a
> partly software MMU. It got me thinking: we could have a TLB and the
> machinery to look up entries in it and translate the addresses in hardware,
> and implement the page table walking and such in software (via a trap).

ha, funny.  i went through this idea twice.  once in 2012 and again 18
months ago.

>  This would I think cut down the work in implementing the MMU by making it
> so we don't need to implement the page table walking machinery in hardware,
> and make it a bit easier to test.

yes.

from a unit test perspective i like it.   it reduces the dependent
components for each test.

> The downside of course would be that TLB misses would be slower, but I
> think it might be a decent trade-off for our first attempt at the MMU.

yyyeahh.  a TLB (basically a cache for MMU lookups) is probably the
critical component, first in the chain.

without a TLB, every MMU lookup (which is *all* memory lookups) can take 4
or more cycles on *every* memory access, both I and D.

so we will definitely need a TLB.  there exists code which i moved to the
"unused" directory that implements one, so we do not need to duplicate that
work (simply adapt it to POWER).

this code was translated from ariane:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/unused/TLB/ariane/tlb.py;h=72b67a2dcc56aa76aca561f5da3345a99c36c2ab;hb=HEAD#l40

note how small the TLB is - only an 8 entry CAM! this is pretty normal, it
saves power and also means it can do single cycle requests.

the wikipedia page on TLB says that intel do i think three nested levels of
TLB Cache (!) going from 8 entries (single cycle) up to i think 2048
entries (multi cycle), to avoid missing clock cycles.

the one downside of a software MMU: now a linux kernel driver (and
potentially u-boot) supporting that design concept is on the list of
dependencies.

that in turn, to ensure a reduced dependency on the hardware team, means
implementing the software MMU in an emulator (pearpc, dolphin, qemu, gem5,
other).

this would allow the new MMU linux kernel driver to be developed and tested
in advance, then the *exact* same linux kernel binary can be run on the
newly developed hardware, and we are not trying to test 3 newly developed
"unknowns" at once (a guaranteed nightmare).

so on first glance it seems like it would save time.

unfortunately, despite the usefulness of not having an MMU at the unit test
level, as an end goal for a final product, due to the additional *software
development*, it actually *adds* time!

however if there is *already* support for a POWER "soft MMU" in the linux
kernel, we can try it out.

honestly though on balance, by conforming to the RADIX MMU already
suppported for POWER in the linux kernel, we join a "well tested" code path.

with Paul Mackerras working on that for microwatt, (right noe), if we
simply wait a while we will have working code that we can by-rote translate
to nmigen. and i kinda enjoy those mind numbing tasks :)

yes, i thought that a softmmu would save time, too, Michael :)

l.

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68