[libre-riscv-dev] daily kan-ban update 19may2020
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Tue May 19 18:34:47 BST 2020
On Tue, May 19, 2020 at 6:13 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
> I'll be working on the mul and div pipelines today,
ah good, whew
> I'm planning on adding
> mul-add and mul-sub support to the partitionable multiplier since that's
> much easier than writing a whole new multiplier.
true.
(remember, the initial plan is not to use partitioning: just a
straight 64-bit MUL. we are running in 180nm: latency is not high)
however.... if you feel that this really should be split into 2
pipeline stages, then even without partitioning at all (which,
remember, we are *not* doing - at all - for this critical Oct
deadline) it may indeed be easier to use PartMul because of its
pipeline capability, hard-code the partition bits to "fully open", and
set the "breakdown" point to a matrix of 4x32 or a matrix of 16x16-bit
MULs rather than the 8x8 which we tested the code with, last year.
in particular i think if we went with 4x32-bit MULs it would not
result in massive simulation times.
also now that i think of it, because it already has the
signed/unsigned calculation (done and tested), it may definitely turn
out to be easier to use PartMul than starting from zero.
> Will then add the FU
> interface logic needed to drive it, perhaps in simplified form (1 mul per
> clock, no packing smaller muls together) since that would allow me to get
> back to the load/store unit faster.
if you can cookie-cut soc.fu.shift_rot (cut/paste copy) i did notice
that the input-data and output-data formats are identical to what is
needed for MUL (particularly due to the 3-input MAC). so that would
just leave the intermediary pipe stages which would be dropped into
soc.fu.shift_rot/pipeline.py ok cookie-cut it would be
soc.fu.mul/pipeline.py
this will get the infrastructure in place very rapidly and give a
template which ultimately just involves filling in main_stage.py and
adjusting pipeline.py
> If there's spare time later, it can be
> made fancier to allow issuing multiple smaller ops per clock and/or have
> fpmul/fpmuladd added.
we're definitely not going to have time to add any of the FP pipeline, at all.
> Will modify the fpdiv pipe to add idiv/udiv/irem/urem as well as div by
> zero and int overflow detection, will then use that in the DIV FUs,
this one i agree fully, yes, good idea, like it. here, use
soc.fu.logical as a cookie-cut template, using the LogicalInputData
data structure (that's: if you can confirm and agree that it is
identical to what is used, for its register allocation).
> can add support for fdiv/fsqrt/frsqrt if there's spare time later.
there definitely won't. time is *really* short.
> I picked both those routes since that seems like the path of least
> resistance, and because it makes adding fp later that much easier.
i am inclined to recommend leaving even consideration of FP off the
table, and, when we have time, literally dumping whatever code was
written and replacing it. although, i suspect that, actually, very
little would actually get thrown away: more... morphed.
l.
More information about the libre-riscv-dev
mailing list