[libre-riscv-dev] Request for Scoreboard and Functional Units Update

Sat May 30 17:26:58 BST 2020

On Sat, May 30, 2020 at 3:59 PM Yehowshua <yimmanuel3 at gatech.edu> wrote:
>
> So browsing through the codebase, it seems we have about 10ish functional units,

soc.fu.*:

* ALU        (done)
* Logical    (done)
* ShiftRot  (done)
* MUL       (not done - Jacob)
* DIV         (not done - Jacob)
* Branch    (done)
* TRAP      (specified)
* SYS        (specified)
* LDST      ("odd one out")
* SPR       (specified)
* CR - Condition Register (done)

of these:

* SYS, TRAP and SPR need writing (they've been defined, no unit test exists yet)
* MUL and DIV we're waiting for Jacob
* LDST depends on a functional L0CacheBuffer, however does not "fit"
the FU mould and kinda-qualifies as "partly complete" because the
majority of functionality is in the *Computation Unit* rather than an
actual "Pipeline".

> about 29 Scoreboard components. How many more scoreboard components and
> functional units are needed for August?

scoreboard components to be written: none.   the entirety of the
scoreboard functionality was done well over a year ago, including
shadowing which covers branches, exceptions, traps, interrupts (and
predication when we add it).

the only reason i had to make changes (2 months ago) was for the
mutli-signal capability.

tying the components *together* is where the complexity lies.

Function Units: see above.

one additional absolutely critical blocker (which i *may* have solved,
today): the Computation Unit "Manager" <-> ALU interaction.  this
*critically* relies on properly implementing the pipeline API... in
both the CompUnit (as sender *and* receiver) and the ALU (as receiver
and sender).

yes, the CompUnit "sends" (operands) to the ALU, then "receives" the
results (from the ALU).  that's its job - its sole purpose.  it's
basically as complex a FSM as any **PAIR** of interacting Wishbone or
AXI4 Bus Master Router-Arbiters, back-to-back, in the same code.

> All the functional units have some formal verification tests, how confident are we
> that the scoreboard works correctly?

strictly speaking: not in the slightest, until there exists full
formal verification for it (and for each component).  there's a
special budget of EUR 6,000 *specifically* for that one task, because
it's (a) extremely involved and (b) actually covers quite a lot of
interaction(s) and components.

on a "proof-of-concept" level: reasonably confident.
score6600_multi.py is functional for ALU operations (including
overlapping ones, including parallel execution), and when i was
testing speculative branch operations (including cancellation either
way - correct / incorrect branches), that worked too.  however that
was for "single read/write".

the complexity lies in the "allocation" of the required components, to
meet the register and FU allocation.  i've defined the "regspec" API
to help meet that.
https://libre-soc.org/3d_gpu/architecture/regfile/

so this is why i said, last week, that it would be a good idea to take
an incremental approach:
https://bugs.libre-soc.org/show_bug.cgi?id=346

this allows us to "have something working" - complete, prove and bug
fix both the regfile *and* Function Units, get the "hello world" demo
out the door much faster than would otherwise be achievable, and
*then* add in the Dependency Matrices.

it works by only issuing one instruction, using the regfile buses
*exclusively* for that *ONE* Function Unit, waiting for that *ONE*
instruction to complete, before moving on to the next one.

this basically means that we would have an IPC of somewhere under...
0.3.  as in: 3-5 cycles per instruction.

adding in the Dependency Matrices then becomes a straightforward
task... *if* we know full well that the Function Units properly comply
to the Pipeline API, interact properly with their Computational Unit
"Manager", properly read and write registers according to the Go_Read
/ Write - Req_Read / Req_Write protocol.

once added, the DMs bring us up to near a 1.0 IPC (branch prediction
being the key, there).

l.