[libre-riscv-dev] microwatt tlb

Luke Kenneth Casson Leighton lkcl at lkcl.net
Fri Mar 27 02:15:03 GMT 2020

hi this is for paul mackerras when you see it.

great to know you're working on microwatt. been such a long time, and
really good to encounter you over the virtual coffee chat earlier today.

here's the link to the radix walk doc i was talking about:

it is really good.  implementations also exist in the same codebase, the
easiest way to track them down is via the commit logs in the
gem5-experimental branch.

for libresoc we really really need clear HDL to "track" if you know what i
mean, because some of these algorithms are very timeconsuming to implement
from scratch.

paul, continuing the (brief) discussion somewhat, in a related way, i
really wanted to make you aware of the implications for microwatt of trying
to make it multistage pipelined.

if you look up jon dawson's IEE754 FPU code you find that it is entirely
FSM based.  as such it is superb for fitting in small FPGAs.

over a 6 month period i morphed it from its humble beginnings into an
absolute beast of a flexibly reprogrammable Object Orientated IEEE754 FP
toolkit, capable of 16, 32 or 64 bit (and anything else you care to add),
arbitrary pipeline depth *or* FSM style engines, SIMD capability, *dynamic*
pipelines, the works.  Jacob helped add a fantastic hybrid fpdiv, sqrt
*and* rsqrt engine all in the same pipeline.  it's now well north of 10,000
lines of python code and takes a week to run all the unit tests.

what i learned along the way was that the FSM based systems vs pipeline
systems trade low gate count and simplicity and code readability for speed,
much higher gate count, and, unless *and even* if OO design is used, turn
into the most awful unreadable unmaintainable total trash.

rocketchip unfortunately, for all the compactness and other advantages, is
particularly bad, *because* of the OO (and total lack of code comments).
the OO hierarchy makes it flat-out impossible for anyone but its original
creators to work with.  (something to alert the chiselwatt team on, there,
although from what i have seen, the code comments in chiselwatt are damn

and yet if you even remotely try anything like modern OO programming design
strategies and tecuniques with these 1980s era languages (HDL, Verilog) you
immediately also run into serious difficulties of a different kind, which i
won't go into, right now.

the bottom line is: adding extra pipeline stages to microwatt not only
complicates it and makes it unreadable and unmaintainable, it increases the
number of LUTs required to the pont where you will need to call it "watt"
or "megawatt"


regarding exceptions and traps: for in-order pipelined systems these are a
damn nuisance.

every solution to every problem in an inorder system is:


that's it.

after learning about the CDC6600, which is a deeply impressive design that
takes literally months to begin to comprehend, i dislike inorder systems

the key thing that i learned from Mitch Alsup about how to handle
exceptions is: you *must* prevent the ALU (or whatever) from "committing"
its result until it is GUARANTEED known that the exception will not occur.

that is the absolute core concept of exceptions (aka interrupts), you MUST
achieve, in some fundamental way.

this is done in an OoO design by marking and tracking the operation - all
the way down the pipeline - with "cancellation masks" (a global ID which
travels *with* the partial result data, down a given pipeline)

this is known as "shadowing", and it is not only the operation that *might*
be cancelled that has to be "shadowmarked", it is *all following
instructions as well* (hence the term "shadow").

shadowed operations are real easy to cancel: pull the "cancel" wire up, ane
everything with the requisite mask ID gets a global signal "whoops we no
longer need to pass this down the pipeline, drop it on the floor".

with all "future" partial (and completed but noncommitted) results now
killed, the PC can safely be redirected at the exception, interrupt, or the
alternative branch, or whatever: these all use *exactly* the same shadowing

you start to see why the majority of inorder systems use "stall stall stall
stall stall" as the be-all and end-all "onnne and only true solution",
here? [ hallelujah, praise the stall... ] :)

basically if i have discouraged you from pursuing the path of adding
multiple pipeline stages to microwatt, then i have succeeded in what i set
out to explain.  because microwatt's value is not just its small code size,
its readability and compactness makes it an ideal reference implementation
*and*, furthermore, its complete lack of optimisation normally seen in
"simple" pipelined designs actually saves on gates / FPGA resources.

the moment you add extra stages, all that goes out the window.

an associate and contributor on our list, Samuel Falvo, taught me some
awesome tricks.

his CPUs are incredible, designed in the most amazing elegant and unusual
ways.  he wrote his own PLA style "language" (written in lisp) which takes
specifications (in lisp lists) for functionality and *generates* VHDL
combinatorial blocks.

he used this tool to create a 6502-like processor with virtually no
manually written glue logic (even the decoder was written in this simple
specification), that easily fits into an ultra low cost ICE40 yet has
massive amounts of room to spare for peripherals.

the key to the success of this approach was the heavy tradeoff of *not*
using pipeline stages, but effectively doing a FSM style entire execution
design.  actually it wasn't even a FSM, i think he used the term "PLA".  i
am garbling it, it was a year ago, there are some keywords i have missed
out, you get the idea though.

what i am really saying is: please do seriously consider simply stalling on
anything that could have an exception thrown.  arrange for the entire
pipeline to simply grind to a halt, *globally* freezing right the way back
to the decode phase, only proceeding when it is guaranteed known, 100%,
that the exception will NOT occur.

this is, believe it or not, an industry-wide "acceptable" technique in ALL
in-order designs.

look up "minerva rv32" on github (it is very readable code) and grep its
source for the word "stall".

you will find a *combinatorial* global signal propagates throughout *every*
pipeline stage, implementing this "stop the world i want to get off!"
industry-wide "solution".

literally every effort that i have seen that attempts *not* to follow that
"solution" for inorder designs involves the word "but" and it goes downhill
from there.

surprising, ehn? :)


crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68

More information about the libre-riscv-dev mailing list