[libre-riscv-dev] daily kan-ban update 19may2020

Tue May 19 18:34:47 BST 2020

On Tue, May 19, 2020 at 6:13 PM Jacob Lifshay <programmerjake at gmail.com> wrote:

> I'll be working on the mul and div pipelines today,

ah good, whew

> I'm planning on adding
> mul-add and mul-sub support to the partitionable multiplier since that's
> much easier than writing a whole new multiplier.

true.

(remember, the initial plan is not to use partitioning: just a
straight 64-bit MUL.  we are running in 180nm: latency is not high)

however.... if you feel that this really should be split into 2
pipeline stages, then even without partitioning at all (which,
remember, we are *not* doing - at all - for this critical Oct
deadline) it may indeed be easier to use PartMul because of its
pipeline capability, hard-code the partition bits to "fully open", and
set the "breakdown" point to a matrix of 4x32 or a matrix of 16x16-bit
MULs rather than the 8x8 which we tested the code with, last year.

in particular i think if we went with 4x32-bit MULs it would not
result in massive simulation times.

also now that i think of it, because it already has the
signed/unsigned calculation (done and tested), it may definitely turn
out to be easier to use PartMul than starting from zero.

> Will then add the FU
> interface logic needed to drive it, perhaps in simplified form (1 mul per
> clock, no packing smaller muls together) since that would allow me to get
> back to the load/store unit faster.

if you can cookie-cut soc.fu.shift_rot (cut/paste copy) i did notice
that the input-data and output-data formats are identical to what is
needed for MUL (particularly due to the 3-input MAC).  so that would
just leave the intermediary pipe stages which would be dropped into
soc.fu.shift_rot/pipeline.py ok cookie-cut it would be
soc.fu.mul/pipeline.py

this will get the infrastructure in place very rapidly and give a
template which ultimately just involves filling in main_stage.py and
adjusting pipeline.py

> If there's spare time later, it can be
> made fancier to allow issuing multiple smaller ops per clock and/or have
> fpmul/fpmuladd added.

we're definitely not going to have time to add any of the FP pipeline, at all.

> Will modify the fpdiv pipe to add idiv/udiv/irem/urem as well as div by
> zero and int overflow detection, will then use that in the DIV FUs,

this one i agree fully, yes, good idea, like it.  here, use
soc.fu.logical as a cookie-cut template, using the LogicalInputData
data structure (that's: if you can confirm and agree that it is
identical to what is used, for its register allocation).

> can add support for fdiv/fsqrt/frsqrt if there's spare time later.

there definitely won't.  time is *really* short.

> I picked both those routes since that seems like the path of least
> resistance, and because it makes adding fp later that much easier.

i am inclined to recommend leaving even consideration of FP off the
table, and, when we have time, literally dumping whatever code was
written and replacing it.  although, i suspect that, actually, very
little would actually get thrown away: more... morphed.

l.