[libre-riscv-dev] [Bug 178] first coriolis2 tutorial, workflow and "test project" page

Wed Feb 26 13:15:18 GMT 2020

http://bugs.libre-riscv.org/show_bug.cgi?id=178

--- Comment #131 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Staf Verhaegen from comment #130)
> (In reply to Luke Kenneth Casson Leighton from comment #129)
> > these blocks we *know* in advance, they are *only* connected by register
> > latches.
> 
> If we are naming things anyway this is called a datapath in the industry.

ah ha! another new term for the wiki :)

> Problem I see with using datapath layout is that typically the input of the
> datapath comes from the register file and also the output has to go to the
> register file. 

ah.  right.  yes you are correct.  we do not want *all* datapath layout to be
NORTH-as-input, SOUTH-as-output.

ok so in the 6600 out-of-order design, the reads go into "Function Unit"
latches (Reservation Stations if you are familiar with the Tomasulo Algorithm
terminology).

the register data waits in those FU latches until all of them are available:
this may not be immediately because, even if there are no Dependency Hazards,
there may simply not be enough Register-File Read-Port bus bandwidth at that
exact moment in order to get the data needed for that Function Unit to fill all
of its operand latches.

once ready, the FU may proceed to ask one of the ALUs for a time-slot.  at that
point it is "go".

all of this, it is still "input on NORTH, output on SOUTH".  or, more
accurately, "data input and latch ACK output on NORTH, data output and latch
WAIT input on SOUTH".

then on the other side of the ALUs, there is *another* latch, this time
capturing the output.

these outputs, there is a "Register Bypass Bus" which can feed *back* into the
Function Unit latches, *and* there is a multi-way Register-File Write-Port Bus.

so it's nowhere near as straightforward as a "standard in-order single-core
pipeline design": you can see that there are several circular datapaths between
the blocks.

> So if you go always left to right one of the sides will be
> far away from the register file. For smaller technology nodes the capacitive
> load of these long paths will be a killer for performance.
> This problem is more pronounced if you have different functional blocks
> where for all the blocks the input and output is coming from and going to
> the register file.
> 
> Using an analytic placer will naturally get both the input and outputs close
> the register file and move the middle of the path further away minimizing
> extra delay from the interconnects.

that would be really nice to have.  because of the circular nature of the
design, we may just have to see how it goes.

the Function Unit input latches and the ALU Result output latches need to be as
close as possible to the register file, via the buses, however the sheer size
of the ALU blocks themselves that are *in between* the input and output latches
is going to make that quite challenging.

one thought there is to split the ALU pipelines into half (and making them
particularly narrow), then turning the data around half way along, routing it
*back* through the second half of the pipeline stages, so that the result data
arrives *back* as close to its starting point as possible.

however we are still looking at a massive data bus.  "Common Data Bus" in
Tomasulo Algorithm terminology.

-- 
You are receiving this mail because:
You are on the CC list for the bug.