[libre-riscv-dev] Advanced Topics on RISCV

Tue Mar 24 12:06:28 GMT 2020

On Tue, Mar 24, 2020 at 11:52 AM Immanuel, Yehowshua U
<yimmanuel3 at gatech.edu> wrote:

> I’ve read through the Spike page and a good portion of the simpleV page.

it's ultimately quite simple in concept: the details are where it gets hairy

> My two goals at the moment are:
> 1. Understand how RISCV handles multiple processes and does page walking

ok so SV will not help you there: it doesn't have anything to do with PTW (etc.)

multiple processes are handled by context-swapping.  register and CSR
state are "saved", the new process selected, and its state "swapped
in" to registers and CSRs.  the last thing that's swapped over is: the
Program Counter.

PTW - you'll need to do quite a bit of research into virtual memory,
TLBs, etc. first.

> 2. Understand how multicore RISCV would work

you'll need to look up "Weak Memory Model" (as opposed to "Total Store
Order").  then look up AMO (atomic memory operations).

> I’m hoping to play with FreeRTOS soon so I can run through its codebase for setting up page tables.

that's a good idea.

> Also, do you know if spike tests the special instructions like exception instructions? Also, what RISCV instructions would a kernel use to set up the pagetables?
>
> Lastly, do you know any good resources for intro to multicore systems? RISCV doesn’t seem to have any multicore specific instructions.

correct.  it has LR/SC semantics on atomic operations, and it is
entirely up to the *operating system* - the kernel - to use these in
an effective way in order to guarantee that memory corruption between
processes does not occur.

*that is all there is to it*.

> My current questions would include things like:
>
> 1. How can the kernel assign tasks to a certain core? If you have a process with multiple threads, it would make sense to spread out the threads among available processors instead of concentrating them on a single core. How might this work with respect to RISCV?

just like it would on any multi-core operating system which was
running on an SMP-capable hardware.  there's absolutely no difference
here.

> 2. Does the hardware ensure cache coherency -

nnnope.  that's what a "Weak Memory Model" is.

> that is - externally - software sees one big cache all though I imagine each core would have a local cache that would have to communicate with other caches?

nnope.

that's why LR/SC atomic semantics exist.  look up AMO.  these are then
used to do kernel-level spin-locks, mutexes etc. all of which are a
hard requirement for sorting out memory clashes.

from my (brief) look at several RISC-V SMP implementations, the
takeaway that i got was: most implementations actually put the AMO ALU
*actually in the L2 cache* (!) - one *single* AMO ALU - accessible
*ONLY* over an exclusive bus such that ONLY one core may use it at any
given time.

thus, atomicity of the Atomic Memory Operations - crucial to - are
done by way of resource contention / starvation, rather than "actual
atomic but parallel memory operation detection and clash avoidance".

given that AMOs are not very common, they basically get away with this
approach.  however for massively-parallel SMP systems (64 cores or
greater) this approach would begin to result in significant contention
and slow-down of certain tasks.

l.