[libre-riscv-dev] ASIC layout questions
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Thu Jun 25 12:49:16 BST 2020
On Thu, Jun 25, 2020 at 11:42 AM Jean-Paul Chaput
<Jean-Paul.Chaput at lip6.fr> wrote:
> > On Thu, Jun 25, 2020 at 11:02 AM Jean-Paul Chaput
> > <Jean-Paul.Chaput at lip6.fr> wrote:
> > consequently, the FPMUL64 experiment (soclayout/experiment6 or 7 i
> > think) fails with an error unless "flatten" is enabled.
>
> OK. I think this is another face of the one I'm currently fighting
> with. There are more than one loophole to close there.
apologies that FPMUL64 is not small, and it may be inter-dependent.
> > > Yosys gives a 850K gates design, which is huge, and inline
> > > with those processing times.
> >
> > woo! that's 8 times larger than anything previously considered. one
> > reason for it *might* be because the high-speed register files are
> > multi-port write. there are only 8 entries however they are 64-bit
> > wide
>
> Are we talking of register files or memory banks/caches?
register files. there are no caches yet in soc/simple/issuer.py - we
still have to add them.
> The laters should
> not be synthesized by Yosys (it would give bad results).
like... 850K gates bad.
> And depending
> on the size of the register files (relative to the whole design),
> we may want to have a hand crafted block(s) here too.
there are actually *five* separate and distinct register files:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py;hb=HEAD
they vary from *binary*-addressed 1R1W (SPR is 110 entries) to
*unary*-addressed 5R5W with QTY8 64-bit entries.
the unary addressing makes them exceptionally weird when compared to
"standard" open source architectures because the "normal"
binary-address-muxer is GONE. as in: we *directly* enable each
register row with a single bit. NOT a binary-addressed version of
that which *internally* enables each register row with a single bit:
those single bits are exposed *externally* as part of the register
file's actual public API.
an additional wonderful weirdness: XERRegs and CRRegs can be addressed
to write the *full* (entire) register file in one massive hit, via one
port that has unary addressing covering *aaaaalll* registers, and it
also has "individual" ports that allow individual registers to be
written/read.
this because XERRegs is actually only 6 bits wide: there are actually
only QTY3 2-bit registers, however we wanted not just to be able to
read/write those individual 2-bit registers (via small ports), we also
wanted to be able to write to the *entire* XER Register (6 bits, in
one hit).
likewise for the Condition Register (CR), this is 32-bits wide,
subdivided into CR0..CR7 which are 4 bits wide. some PowerISA
instructions need to read/write the full 32-bits; some PowerISA
instructions need to read or write *only* the 4-bit individual
registers.
we certainly did not want to have to do a Read-Modify-Write cycle here
(not when we are doing a parallel processor), so divided CRRegs down
into separate CR0..CR7 4-bit regs *but* we also provide this "virtual"
port which can write the *full* regfile.
thus we have a very weird port arrangement for CR: 4R3W but 1R and 1W
of that is a full 32-bit wide, and the other 3R2W are only 4-bit wide:
*all* of them however are unary-addressed, and the full 32-bit wide
one can set *MULTIPLE* bits to say which of the 8 CR0-7 registers are
to be written / read.
the Integer Regfile is, for now, 3R2W. i did not want to have to do
"contention" for the write ports at this phase, but i can do if you
think it is necessary. it complicates the core.py code (which is
already complex) hence why i did not want to get into it quite just
yet.
however as the INTRegs, because it is 3R2W, cannot use an SRAM, and it
is 32x 64-bit entries, this may end up being massive. plus, if we are
not actually going to *use* that in production (using 3R1W or 4R1W) it
is not sensible to do a huge amount of work on that then throw it
away.
if you feel that a 3R2W (or better 4R2W) is achievable and still have
sane sizes, *great*.
one thing: all the regfiles need to be write-through. i.e. if there
is one port that is writing and another port is simultaneously reading
to that same register, the data being written *must* be passed through
to the reader... *on that clock cycle*. in the regfile.py code i have
done this with a "wrapper" on the front of the nmigen Memory class.
l.
More information about the libre-riscv-dev
mailing list