[libre-riscv-dev] daily status update 05may2020

Luke Kenneth Casson Leighton lkcl at lkcl.net
Tue May 5 15:23:39 BST 2020

On Tue, May 5, 2020 at 2:02 PM Yehowshua <yimmanuel3 at gatech.edu> wrote:

> What exactly is a LDSTCompUnit? A cursory Google search just brings up Libre-SOC.

:)  Load Store Computational Unit.

see Mitch Alsup's "Scoreboard Mechanics" chapters (which are a
critical pre-requisite to read, for understanding).

10.4.6 A Computational Unit
A Computational Unit is responsible for the data manipulation of an
instruction, while the
Function Unit is responsible for the data-flow of an instruction.
Thus, a CDC 6600
Computational Unit will contain storage for the 6-bit instruction,
storage for both operand
values, storage for the result value(s), and a ​Timing Chain​. The
timing chain is used to
generate the Request_Release signal back to the ScoreBoard, at the
appropriate point in
time. Some more modern Function Units will have implicit Request_Release timing.

therefore, a  "Load Store Computational Unit" is responsible for the
data-manipulation side of LOAD and STOREs.

> Does it take a LD/ST instruction and compute the offset before access?

it "manages":

* the computation of the Effective Address,
* the communcation to the actual LOAD/STORE memory
* the sending (ST) or receipt (LD) of data

i just updated the docstring to explain it further:

> Why do you call it cache buffer - I’ve only heard the term cache.

because it effectively performs both roles.  it both buffers the LD/ST
data and also has some of the characteristics of a cache.  and because
i couldn't decide which it was :)

> Also, how many ways is the cache?

one.  it's not a "normal" cache.

> Is it configureable

yes, however it strictly matches the total number of LDSTCompUnits as
inputs, and is strictly and specifically designed to have "dual ports"
- connecting to interleaved odd-even *dual* L1 128-bit-wide Caches.


so the "options" are hard-coded by other resources.  its role is
basically to "merge" a whopping *SIXTEEN* non-cache-aligned individual
and narrow LD/ST operations into (two) wide cache-aligned
operations... *on every clock cycle*.

yes, 16 incoming 64-bit requests on every cycle.

yes, 2 outgoing 128-bit requests on every cycle.

this is the only way we're going to meet the (insane) memory bandwidth
requirements of a GPU.

> are you pulling in the source from minerva?

that's already been done.  i described this yesterday.

the plan is to move the source code from minerva for the L1 Caches
(and delete everything else as extraneous), widen the minerva L1 cache
code to 128 bit wide cache lines, and add double 64-bit Wishbone
Interfaces to *each*.

we'll need Wishbone Arbiters to do that.

> > today i'll be working on the redesigned LD/ST Computational Unit which
> > has 3R-2W (indexed and update) capability.
> > https://libre-soc.org/3d_gpu/ld_st_comp_unit.jpg <https://libre-soc.org/3d_gpu/ld_st_comp_unit.jpg>
> Can we have a link to the gittree for this code.

yes, sorry.  just committed the work-in-progress:


More information about the libre-riscv-dev mailing list