[libre-riscv-dev] libresoc memory architecture

Tue Jun 23 23:14:37 BST 2020

---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

On Tue, Jun 23, 2020 at 10:15 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Tue, Jun 23, 2020, 13:52 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > On Tue, Jun 23, 2020 at 9:02 PM Michael Nolan <mtnolan2640 at gmail.com>
> > wrote:
> >
> > > On 6/23/20 3:52 PM, Luke Kenneth Casson Leighton wrote:
> > > > question: where do the PowerISA MMU tables get the information from
> > about
> > > > which bits are IO, RAM, or other?
> > > >
> > > > answer: they come from the information about the physical memory map.
> > > Wouldn't they come from the software that sets up the page tables?
> >
> > and that software would need to read the information from somewhere,
> > in order to know what it was, and consequently by another level of
> > indirection we come back to that exact same hardware map function /
> > table.
> >
>
> Your missing that the hardware map function doesn't need to be built in to
> the memory system in the same way it is in RISC-V:
> in RISC-V, the hardware address decoder is responsible for determining if a
> particular address range is memory, i/o, not present, etc. which tells the
> core if a particular access can be cached, write-combined, etc. Since it
> controls if an access is cached or write-combined, it needs to be stuck in
> the core so the core can know that it can cache or combine memory ops.
>
> in Power, the hardware address decoder is only responsible for directing a
> memory access to a particular peripheral or memory,

bear in mind that there's a difference between address decoder and
address _checker_.  the address decoder is activated when the
read/write occurs; the address checker is used *by* the address
decoder.

also that in the "take-it-or-leave-it" contract-style Memory Bus, the
two are synonymous, by way of "you try the atomic read/write, you get
an instant answer back if it succeeded or failed".

where in a Wishbone-based architecture you simply don't care because
you never see more than one memory request at a time (because of
single-core non-speculative execution) this merging of the two roles,
"check and decode" is irrelevant.

> the part that
> determines if a particular address is cacheable or if memory accesses can
> be combined is the page table entries stored in the MMU (the TLB
> specifically), which is generic, it doesn't need to know the exact address
> layout because that will be programmed by software.

ok.  right.  yes... but the information still has to come from
somewhere, and as i explained, the granularity of pages (4k, 16k, 64k)
is not adequate to cover say DRAM or SPI memory-mapped peripheral
registers.  these will only take up a few bytes (or words) of a
4k/16k/64k page granularity.

> The software can get
> that information from the boot rom, reading a device tree from the boot
> drive (https://www.devicetree.org/),

yes it can - and part of the pinmux program's job will be to
auto-generate them so that they don't have to be written manually (and
mistakes made).  the information comes from the exact same definition
in the pinmux that also generates the hardware range-check function.

> enumerating PCIe devices, reading ram
> chips' config registers, and/or other methods.

ok.  *some* of this information is dynamic.  i mentioned this already:

"the DRAM one ... no, actually *any* peripheral that involves access
to memory addresses (LPC, FlexBus, 8080 bus, etc) all those will be a
leetle more complicated as the "valid range" will depend on the
physical external device that is connected."

however some of it you can *not* get except by way of hard-coded
information, the page table is *not* adequate, does *not* tell you
that if you write to an address offset only by 20 bytes from the
beginning of the page that it will fail because the memory-map for
those registers is only 16 bytes long and the entire rest of the upper
bytes are invalid if accessed.

and, again: having the kernel be the enforcer of this software-only
(DTB-based) information is an extremely bad idea... and the hardware's
there *anyway*.

yes the two should obviously match (because otherwise there will be
catastrophic kernel-level bugs) however relying solely and exclusively
on software is an extremely bad idea.

> The method of getting that
> information is *not* speed critical, since it's only retrieved at boot time
> or when hardware is added/removed.

yeees... but the enforcement still has to match, the hardware still
has to provide this speculative contract ("offer, exchange, complete")
by providing us with a means to determine if the address is valid
*before* it is accessed, because if it doesn't, *we can't do multiple
LD/STs*.

> This means that a Power core doesn't
> need to be customized at the silicon level for each address layout since
> that info is programmed into the MMU.

again: it is there anyway, you have to have it, because you simply
cannot have a write "succeed" to an address that does not exist.

we're talking about *existing* hardware in *existing* code
(nmigen-soc) *already* having the required address-checking
functionality... just not exposed by a public API (or part of the WB
protocol).

> Admittedly, we will probably have to cludge something in there until we
> have a sufficiently working MMU.

with the hardware providing the range-checking (and also enforcing the
read/write by using the range-checking at the time of the write) we do
not need to have an MMU at all in order to get testable code.

this is really very simple and does not need any kludges.  as i said:
the hardware *already exists* (in the Decoder) because the Decoder has
to route the request to the right peripheral anyway.

so the Decoder *knows* whether the address will succeed or not.  it's
just that the "take-it-or-leave-it" Wishbone contract prevents and
prohibits us from being able to do that check *unless* the actual
request is performed.

by which time it's too late as far as speculative execution is
concerned: "damage" will have been done that is irreversible, and we
are most definitely not going to do a Transaction / Rollback
architecture, taking snapshots of memory.  that would be an insanely
complex project.

all that is needed is to expose that address-check information -
information which *already exists* at the hardware level - via a
"checker" function.

this one very simple augmentation provides the means for us to turn
the "take-it-or-leave-it" atomic-only single issue requests into
"offer exchange complete" parallel-capable multi-issue speculative
requests.

l.