[libre-riscv-dev] Yehowshua Tasks

Tue Jun 16 00:55:34 BST 2020

On Mon, Jun 15, 2020 at 11:47 PM Yehowshua <yimmanuel3 at gatech.edu> wrote:
>
> Hello again Luke,
>
>         As I wrap up my thesis, I’d like to get into the flow of things more.
>         I can put I about four hours a day on LibreSOC.

fantastic.  that's a _lot_.

>         1. What are some tasks I can get started with right away?

(btw all of these, start by raising a bugreport.  i'll catch up
tomorrow and link them in to various dependent tasks).

an immediately useful task: a unit test for harry ho's sram.py (which
the comments on the pull for nmigen-soc say he didn't have time to do)
would be invaluable.  i will do a repo tomorrow for that.

some comments in soc/minerva/cache.py!  there's virtually none - at
all - in the entire codebase, and it's a serious problem.  the
comment-to-code ratio in soc is somewhere between 5% and sometimes as
high as 20%.  minerva?  0% to 1%.  cache.py is *zero* comments.

some references to *any* kind of online documentation about "what kind
of cache this is".

also, parameterising of the minerva code so that address and data
widths can be selected (128-bit, 64-bit, 32-bit)

also another useful task: LDSTSplitter needs to have the data bitwidth
multiplied by 8.   rather embarrassingly i forgot to do that, and
consequently the mask is *bit* level not *byte* level :)

this involves multiplying ashift1 and ashift2 by 8, as well as using
the same Repl trick as used in LenExpand on mask1 and mask2:
        l = []
        print ("llen", llen)
        for i in range(llen):
            l.append(Repl(mask[i], 8))
        comb += BYTEEXPANDED_mask.eq(Cat(*l))

that little trick turns a 16 *bit* mask into a 16 *byte* mask, where
each byte is either 0b00000000 or 0b11111111

the *expanded* masks can then be applied here:

            comb += ld1.ld_i.eq((self.st_data_i << ashift1) &
BYTEEXPANDED_mask1)
            comb += ld2.ld_i.eq((self.st_data_i >> ashift2) &
BYTEEXPANDED_mask2)

and at that point, LDSTSplitter is ready for use in DualPortSplitter
(or, we code-morph LDSTSplitter *into* DualPortSplitter).

this is how we will be dealing with mis-aligned requests (exactly as
in the diagram), so it's kinda strategically important.

>         Mainly related to nMigen - I don’t have patience for QEMU
>         right now. Last time I did extensive QEMU work was a pretty
>         painful experience about a year ago.

yeh we're not going to modify qemu: the level of optimisations they've
added is far beyond the understandability we need at the moment.
_using_ qemu (see soc/simulator/test_sim.py) is a different matter.

>         2. I can go ahead and write a bench test for Minerva’s cache.
>
>         3. Also, how can I get more familiar with the LD/ST comp unit
>         including the dual cache even/odd architecture from the YouTube
>         video you linked to earlier today?

run the following - and carefully examine both the ilang file using
yosys "show", and load the resultant vcd files in gtkwave.  they are
in order of simplest (lowest level leaf nodes) right the way up to
where things are at the moment.

* soc/experiment/l0_cache.py

   see TestL0Cache.  this will run a couple of LDs and STs directly.
they go straight into a TestMemory instance.

* soc/experiment/compldst_multi.py

  (which sigh after getting the next tests working, that one doesn't.
will fix it tomorrow)

  this will run some LD/STs through the Computation Unit interface.
this is pretty much exactly as described in Mitch's book... except
it's been converted to multi-signal (one REQ/GO per *register* rather
than a "global read" and "global write")

 LDSTCompUnit is a more advanced (more REQ/GO signals) version of an
ALU MultiCompUnit.  it is probably also worthwhile checking
soc/experiment/test/test_compalu_multi.py as well (which Cesar has
been working on).

 see https://bugs.libre-soc.org/show_bug.cgi?id=336 and
https://bugs.libre-soc.org/show_bug.cgi?id=312

 bear in mind that REQ/GO are basically ready/valid signalling and
follow the EXACT same protocol followed by AXI4 and Wishbone
signalling.

* soc/fu/compunit/test/test_ldst_compunit.py

  this one runs *actual LD/ST instructions*, after actually decoding
them (with PowerDecode2) and throws the input in using the REQ/GO
protocol, followed by waiting for (and confirming) the output is as
expected using (surprise) the REQ/GO protocol.  it also enumerates the
ISA Simulator memory and checks that the memory modifications are
identical

* soc/simple/test/test_core.py

 this one runs *all* available unit tests - actual instructions, again
after actually decoding them.  if you want to "only" test the LD/ST
ones, go to the bottom of the file and comment out everything but the
LDSTUnitTest.

 the difference between this and test_ldst_compunit.py is it actually
grabs the inputs needed to go to the LDSTCompUnit (and all others)
*from the register file*.  and likewise on output REQ/GO, puts the
results into the regfile.

 the REQ/GO is *DIRECTLY* synchronised with the Regfile read-enable /
write-enable for the active port.

regarding the Dual Port Splitter:  this code hasn't been written yet,
although the pieces are in place (use ctags -R and vim ":tag
{insertkeyword}" to navigate to these classes)

* DualPortSplitter
* LDSTSplitter
* LenExpand
* PortInterface
* DataMerger (this will be relevant later - not straight away)

>         Is it related to the memory Hierarchy section in Mitch Alsup’s
>         computer architecture chapter 10?

almost - not quite.  i've had to go far beyond that although the
basics are there.

l.