[libre-riscv-dev] Request for Scoreboard and Functional Units Update
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sat May 30 17:26:58 BST 2020
On Sat, May 30, 2020 at 3:59 PM Yehowshua <yimmanuel3 at gatech.edu> wrote:
>
> So browsing through the codebase, it seems we have about 10ish functional units,
soc.fu.*:
* ALU (done)
* Logical (done)
* ShiftRot (done)
* MUL (not done - Jacob)
* DIV (not done - Jacob)
* Branch (done)
* TRAP (specified)
* SYS (specified)
* LDST ("odd one out")
* SPR (specified)
* CR - Condition Register (done)
of these:
* SYS, TRAP and SPR need writing (they've been defined, no unit test exists yet)
* MUL and DIV we're waiting for Jacob
* LDST depends on a functional L0CacheBuffer, however does not "fit"
the FU mould and kinda-qualifies as "partly complete" because the
majority of functionality is in the *Computation Unit* rather than an
actual "Pipeline".
> about 29 Scoreboard components. How many more scoreboard components and
> functional units are needed for August?
scoreboard components to be written: none. the entirety of the
scoreboard functionality was done well over a year ago, including
shadowing which covers branches, exceptions, traps, interrupts (and
predication when we add it).
the only reason i had to make changes (2 months ago) was for the
mutli-signal capability.
tying the components *together* is where the complexity lies.
Function Units: see above.
one additional absolutely critical blocker (which i *may* have solved,
today): the Computation Unit "Manager" <-> ALU interaction. this
*critically* relies on properly implementing the pipeline API... in
both the CompUnit (as sender *and* receiver) and the ALU (as receiver
and sender).
yes, the CompUnit "sends" (operands) to the ALU, then "receives" the
results (from the ALU). that's its job - its sole purpose. it's
basically as complex a FSM as any **PAIR** of interacting Wishbone or
AXI4 Bus Master Router-Arbiters, back-to-back, in the same code.
> All the functional units have some formal verification tests, how confident are we
> that the scoreboard works correctly?
strictly speaking: not in the slightest, until there exists full
formal verification for it (and for each component). there's a
special budget of EUR 6,000 *specifically* for that one task, because
it's (a) extremely involved and (b) actually covers quite a lot of
interaction(s) and components.
on a "proof-of-concept" level: reasonably confident.
score6600_multi.py is functional for ALU operations (including
overlapping ones, including parallel execution), and when i was
testing speculative branch operations (including cancellation either
way - correct / incorrect branches), that worked too. however that
was for "single read/write".
the complexity lies in the "allocation" of the required components, to
meet the register and FU allocation. i've defined the "regspec" API
to help meet that.
https://libre-soc.org/3d_gpu/architecture/regfile/
so this is why i said, last week, that it would be a good idea to take
an incremental approach:
https://bugs.libre-soc.org/show_bug.cgi?id=346
this allows us to "have something working" - complete, prove and bug
fix both the regfile *and* Function Units, get the "hello world" demo
out the door much faster than would otherwise be achievable, and
*then* add in the Dependency Matrices.
it works by only issuing one instruction, using the regfile buses
*exclusively* for that *ONE* Function Unit, waiting for that *ONE*
instruction to complete, before moving on to the next one.
this basically means that we would have an IPC of somewhere under...
0.3. as in: 3-5 cycles per instruction.
adding in the Dependency Matrices then becomes a straightforward
task... *if* we know full well that the Function Units properly comply
to the Pipeline API, interact properly with their Computational Unit
"Manager", properly read and write registers according to the Go_Read
/ Write - Req_Read / Req_Write protocol.
once added, the DMs bring us up to near a 1.0 IPC (branch prediction
being the key, there).
l.
More information about the libre-riscv-dev
mailing list