[libre-riscv-dev] ASIC layout questions

Thu Jun 25 12:49:16 BST 2020

On Thu, Jun 25, 2020 at 11:42 AM Jean-Paul Chaput
<Jean-Paul.Chaput at lip6.fr> wrote:
> > On Thu, Jun 25, 2020 at 11:02 AM Jean-Paul Chaput
> > <Jean-Paul.Chaput at lip6.fr> wrote:
> > consequently, the FPMUL64 experiment (soclayout/experiment6 or 7 i
> > think) fails with an error unless "flatten" is enabled.
>
>   OK. I think this is another face of the one I'm currently fighting
>   with. There are more than one loophole to close there.

apologies that FPMUL64 is not small, and it may be inter-dependent.

> > > Yosys gives a 850K gates design, which is huge, and inline
> > >   with those processing times.
> >
> > woo!  that's 8 times larger than anything previously considered.  one
> > reason for it *might* be because the high-speed register files are
> > multi-port write.  there are only 8 entries however they are 64-bit
> > wide
>
>   Are we talking of register files or memory banks/caches?

register files.  there are no caches yet in soc/simple/issuer.py - we
still have to add them.

> The laters should
>   not be synthesized by Yosys (it would give bad results).

like... 850K gates bad.

> And depending
>   on the size of the register files (relative to the whole design),
>   we may want to have a hand crafted block(s) here too.

there are actually *five* separate and distinct register files:
https://git.libre-soc.org/?p=soc.git;a=blob;f=src/soc/regfile/regfiles.py;hb=HEAD

they vary from *binary*-addressed 1R1W (SPR is 110 entries) to
*unary*-addressed 5R5W with QTY8 64-bit entries.

the unary addressing makes them exceptionally weird when compared to
"standard" open source architectures because the "normal"
binary-address-muxer is GONE.  as in: we *directly* enable each
register row with a single bit.  NOT a binary-addressed version of
that which *internally* enables each register row with a single bit:
those single bits are exposed *externally* as part of the register
file's actual public API.

an additional wonderful weirdness: XERRegs and CRRegs can be addressed
to write the *full* (entire) register file in one massive hit, via one
port that has unary addressing covering *aaaaalll* registers, and it
also has "individual" ports that allow individual registers to be
written/read.

this because XERRegs is actually only 6 bits wide: there are actually
only QTY3 2-bit registers, however we wanted not just to be able to
read/write those individual 2-bit registers (via small ports), we also
wanted to be able to write to the *entire* XER Register (6 bits, in
one hit).

likewise for the Condition Register (CR), this is 32-bits wide,
subdivided into CR0..CR7 which are 4 bits wide.  some PowerISA
instructions need to read/write the full 32-bits; some PowerISA
instructions need to read or write *only* the 4-bit individual
registers.

we certainly did not want to have to do a Read-Modify-Write cycle here
(not when we are doing a parallel processor), so divided CRRegs down
into separate CR0..CR7 4-bit regs *but* we also provide this "virtual"
port which can write the *full* regfile.

thus we have a very weird port arrangement for CR: 4R3W but 1R and 1W
of that is a full 32-bit wide, and the other 3R2W are only 4-bit wide:
*all* of them however are unary-addressed, and the full 32-bit wide
one can set *MULTIPLE* bits to say which of the 8 CR0-7 registers are
to be written / read.

the Integer Regfile is, for now, 3R2W.  i did not want to have to do
"contention" for the write ports at this phase, but i can do if you
think it is necessary.  it complicates the core.py code (which is
already complex) hence why i did not want to get into it quite just
yet.

however as the INTRegs, because it is 3R2W, cannot use an SRAM, and it
is 32x 64-bit entries, this may end up being massive.  plus, if we are
not actually going to *use* that in production (using 3R1W or 4R1W) it
is not sensible to do a huge amount of work on that then throw it
away.

if you feel that a 3R2W (or better 4R2W) is achievable and still have
sane sizes, *great*.

one thing: all the regfiles need to be write-through.  i.e. if there
is one port that is writing and another port is simultaneously reading
to that same register, the data being written *must* be passed through
to the reader... *on that clock cycle*.  in the regfile.py code i have
done this with a "wrapper" on the front of the nmigen Memory class.

l.