[Libre-soc-dev] WIP demo of deficiency of 6600-derived architecture compared to register renaming
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Tue Oct 27 21:23:36 GMT 2020
On Tue, Oct 27, 2020 at 7:52 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Tue, Oct 27, 2020 at 10:10 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> > I am, however, going to finish the demo diagram since I think the issue is a little more complex than just a WaW hazard.
good, because it needs investigating properly.
> Completed and pushed! It won't show up until the ikiwiki errors are
> fixed,
sorted
> however it is in the git repo:
>
> https://git.libre-soc.org/?p=libreriscv.git;a=blob;f=3d_gpu/architecture/compared_to_register_renaming.mdwn;h=8b4e2e4c78dd889985736ea60968f768bcb3a122;hb=d4489f6f651f5d88541ca056636e5870d5a01d3f
>
> https://libre-soc.org/3d_gpu/architecture/compared_to_register_renaming/
comments.
1) "Notice how the WaR Waits on `r9` cause 2 instructions to finish
per cycle (5 micro-ops per 2 cycles)
right. this isn't necessarily the case. once an FU has read from the
regfile into its in-flight it drops the dependency entirely. thus if
the new instruction being issued is after that point there will only
be the one WaR wait, not two.
2) in column 3 i'm not seeing an INT reg write. so the delay "Av r3"
is unnecessary.
the design that we are doing, the different regfiles are completely
independent. CTR is *not* in the same regfile as INT regs, neither is
XER and CR is entirely indepenent as well. the full list is here:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py;hb=HEAD
note that the smaller regfiles (State, Fast, CR. XER) *do* have
multiple write ports. these will be done entirely as DFFs and they
are tiny.
it is only the absolutely massive ones (32x64 INT, 100+ SPRs) that
have only the one write port. INT takes up 1/4 of the 180nm ASIC size
(as big as the 64-bit multiplier: 15,000 gates), SPR takes up...
mm.... 10%. CRs is about 1% despite it having *3* write ports and *5*
read ports.
3) in the (ridiculously complex) WaW detection system it becomes
possible to eliminate WaW entirely, by detecting the condition that a
WaW register has been overwritten. it then becomes a "pure" in-flight
nameless register and sits exclusively in the output latch of the FU.
once the last reader of that FU latch has got the result (which goes
by the Op-Fwd bus only), and there are no more read dependencies on
that result then because of the earlier detection that it was an
overwritten WaW it may be *DROPPED* on the floor.
this saves a write to the regfile port and if things are particularly
busy there will be no free slot in the regfile write anyway. however
interestingly if a write slot _does_ become available then it can be
written to the regfile, the FU is freed up, and from that point
onwards the value is treated as an ordinary reg-read.
welcome to one of the most mind-bendingly complex areas of computer
architecture :)
l.
More information about the Libre-soc-dev
mailing list