[libre-riscv-dev] TLB

Sun Apr 21 06:43:09 BST 2019

On Sun, Apr 21, 2019 at 1:50 AM Daniel Benusovich
<flyingmonkeys1996 at gmail.com> wrote:
>
> >  honestly, i was thinking of following ariane, converting mmu.sv next,
> > so modifying the ariane ptw.py or tlb.py too much would lead to
> > clashes
>
> So we are leaving it as is?

 i'd recommend that until mmu.sv is converted to mmu.py so that we see
how it fits together

> I would be a little hesitant as we were
> planning on using Sv48 rather than Sv39.

 we were? :)  tum-te-tum... section 4.5 of privspec
V20190405-Priv-MSU-Ratification ... the only difference iiis... the
"reserved" bits of vpn are used as a 4th level, likewise ppn now has a
4th level, of 9 bits in length...  no, that's not quite right, ppn is
17 bits long, blech.

 that really should not be difficult to add/extend ariane ptw.py to
support Sv48.

> Also is the PLRU implemented into the TLB or is that still a todo?

 it's done already, as in, it's already split out, in ariane tlb.py,
as a separate module that's used *by* tlb.py

> We might as well throw out most of what I have so far.

 eek!

> Ariane have
> their own versions of pretty much class albeit stuffed into one mega
> file.

 yes, which is a style that i don't like.  the code (and the format)
that you've written is superb and very clear, and throwing that away
is not... well...

> We can probably keep the PermissionValidator but across from that
> almost everything seems to be already contained. Unless we are
> planning on stuffing the CAM into the ariane TLB.

 yes.  i've taken a look at ariane's cam implementation, and it's a
*lot* of code.

 oo that's interesting, i just discovered that they're using openpiton:
 https://mshahrad.github.io/openpiton-asplos16.html

 *wow*.  200,000-cores.  holy cow.  that would be awesome to be able to do that.

 okaaaay, *that's* why they don't have a L2 cache, because openpiton handles it.
 http://parallel.princeton.edu/papers/openpiton-asplos16.pdf

 section 2.5.2 and section 2.3.3

 okaaay... just found that the ariane team make it optional, via an
internal #ifdef, on the AXI4 bus.  so a core will either connect to a
standard L1 I/D-Cache, via an AXI4 Bus,  *or* it will connect to the
OpenPiton subsystem (again, via an AXI4 Bus) and gain L1.5 and L2
distributed cache, indirectly.

 openpiton is not messing about, it's 3 priority levels of 64-bit-wide
data buses.

 so it seems that, in general, L2 is just handled completely
transparently behind a Bus Architecture, so we can implement it later,
completely independently of TLB, PTW, L1 I/D-Cache etc.

> I am also having a hard time looking at the code without any comments
> in regards to input and outputs.

 ah, yes, the convention appears to be _i for input and _o for output.

> >  so i am not sure what to suggest.... although... logically... i
> > *think*... if we have like... a Wishbone Bus, where the memory access
> > go through that before getting to the actual physical memory, then L2
> > is like, completely separate.
>
> Sure. I will look at that when I can. I am not sure whether to focus
> on the TLB or the bus now.

 TLB probably best.  deciding whether to use AXI4 or wishbone is a
pretty maaajor decision that's tightly integrated into the rest of the
memory and inter-connect, and with the peripherals as well.

> >  so can we focus on AssocCache, TLB and PTW initially?
>
> Why do we need the AssocCache at all?

 no AssocCache, no SMP.  and we need SMP.  if openpiton turns out to
be too much (for some reason) we'll need it.

 even if just like in ariane with a #ifdef, behind a Memory Bus
(either AXI4 or wishbone), we make both options availble.

> If we are not modifying the
> tlb.py we should stay away from how it is storing data.

 my understanding is that the AssocCache would be behind the Memory
Bus, and because that transfers individual reads/writes, the SMP
AssocCache Coherence can be triggered *on* those reads/writes *by*
accesses *over* the Memory Bus.

 so, really, really-really-really, L2 (AssocCache) really is totally
separate from L1/TLB/PTW and can be developed and decisions made
completely separately and independently.

> >  and, shall i do mmu.sv converted to nmigen?   do you want to take a
> > look first, see what you think:
> >  https://github.com/pulp-platform/ariane/blob/master/src/mmu.sv
> >
> >  it really does just (mostly) link instruction/data tlbs and ptws together.
>
> I am not familiar with SystemVerilog so it is rather difficult for me
> to have an opinion.

 yes, they have to use _n and _q where _n is the combinatorial
(previous) and _q is sync'd, so you *read* from _q but write to _n...
it's horribly confusing when you are used to nmigen comb/sync :)

 still the code is very very consistent so becomes readable after you
get used to the patterns.

> At a high level it is reasonable and makes sense with an instruction
> and data TLB. We can probably use the PermissionValidator in there too
> which could be handy.

 ok, then i'll do that today, then we can have a look.

 if you can do unit tests for tlb.py and ptw.py, also separate out
class PLRU and TLBContent to separate modules it will help get more
familiar with them both.

l.