[libre-riscv-dev] div/mod algorithm written in python
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sun Jul 21 21:09:10 BST 2019
jacob, you managed to somehow delete the modifications that i'd made,
below. i've restored them as they're really important. we can't have
a chain of OR gates in a sequence: it results in huge gate delay
chains.
bool() is an operator (supported by yosys) that is specially optimised
by creating an actual cell that performs a multi-input OR, in an
extremely gate-efficient fashion. i've used it several times,
including for a multi-input AND, by negating all the inputs and also
negating the output.
l.
diff --git a/src/ieee754/div_rem_sqrt_rsqrt/core.py
b/src/ieee754/div_rem_sqrt_rsqrt/core.py
index e6a0b9b..da1be3a 100644
--- a/src/ieee754/div_rem_sqrt_rsqrt/core.py
+++ b/src/ieee754/div_rem_sqrt_rsqrt/core.py
@@ -376,16 +376,19 @@ class DivPipeCoreCalculateStage(Elaboratable):
bit_value ^= pass_flags[j]
m.d.comb += next_bits.part(i, 1).eq(bit_value)
- next_compare_rhs = 0
+ # XXX using a list to accumulate the bits and then using bool
+ # is IMPORTANT. if done using |= it results in a chain of OR gates.
+ l = [] # next_compare_rhs
for i in range(radix):
next_flag = pass_flags[i + 1] if i + 1 < radix else 0
selected = Signal(name=f"selected_{i}", reset_less=True)
m.d.comb += selected.eq(pass_flags[i] & ~next_flag)
- next_compare_rhs |= Mux(selected,
- trial_compare_rhs_values[i],
- 0)
+ l.append(Mux(selected, trial_compare_rhs_values[i], 0)
+
+ # concatenate the list of Mux results together and OR them using
+ # the bool operator.
+ m.d.comb += self.o.compare_rhs.eq(Cat(*l).bool())
- m.d.comb += self.o.compare_rhs.eq(next_compare_rhs)
m.d.comb += self.o.root_times_radicand.eq(self.i.root_times_radicand
+ ((self.i.divisor_radicand
* next_bits)
On Sun, Jul 21, 2019 at 6:50 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> just added in a comment in FPDivStage0Mod. basically the mantissas
> "represent" numbers in the range 0.5 to 0.999999, and they both have
> their MSB set to 1. the rest of the mantissa can be zero or it can be
> all 1s.
>
> this means that the result (the quotient) will be *guaranteed* to be
> in the range 0.499999 to 1.99999998. anything other than that has
> *already* been taken care of by "specialcases".
>
> now, if the subtraction of the exponents, followed by rounding, takes
> the exponent over the maximum for a FP number, that's dealt with
> *post-normalisation*.
>
> if likewise the value in the exponent (z.e) is -126, and the mantissas
> come out to 0.499999 and that results in rounding taking the exponent
> down to -127, that is *also* taken care of by post-normalisation.
>
> what i'm trying to get across is: *everything* is already taken care
> of. the only thing - the sole thing that needs to be done - is to
> wire in the setup, intermediate and final modules (DivPipe*Stage), to
> produce the quotient_root based on the divisor and dividend.
>
> even the operator can probably be hard-wired (for now) to "unsigned int div".
>
> l.
>
>
> class FPDivStage0Mod(Elaboratable):
>
> # the mantissas, having been de-normalised (and containing
> # a "1" in the MSB) represent numbers in the range 0.5 to
> # 0.9999999-recurring. the min and max range of the
> # result is therefore 0.4999999 (0.5/0.99999) and 1.9999998
> # (0.99999/0.5).
>
> m.d.comb += [self.o.z.e.eq(self.i.a.e - self.i.b.e + 1),
> self.o.z.s.eq(self.i.a.s ^ self.i.b.s)
> self.o.dividend.eq(self.i.a.m), # TODO: check
> self.o.divisor_radicand.eq(self.i.b.m), # TODO: check
> self.o.operation.eq(Const(0)) # TODO (set from ctx.op)
>
> On Sun, Jul 21, 2019 at 4:33 PM Luke Kenneth Casson Leighton
> <lkcl at lkcl.net> wrote:
> >
> > i added in the "stage_index" parameter which should now finally start
> > to make it clear for you what's going on.
> >
> > the stage_index is computed back in ieee754/fpdiv/pipeline.py, which
> > is where the "construction" takes place, and it has three modes.
> > first mode adds in a "FPDivStagesSetup" as the first StageChained
> > "thing", the others will be "FPDivStagesIntermediate"s. pipes in the
> > middle are only made from a StageChain of intermediaries. last one
> > has a Final at the end of the StageChain.
> >
> > the fact that the "stage_index" gets passed around should give you a
> > "handle" on how this fits together, as it's the first piece of
> > "config" data that is in place.
> >
> > l.
> >
> > class FPDivStagesSetup(FPState, SimpleHandshake):
> >
> > - def __init__(self, pspec, n_stages):
> > + def __init__(self, pspec, n_stages, stage_offs):
> > FPState.__init__(self, "divsetup")
> > self.pspec = pspec
> > self.n_stages = n_stages # number of combinatorial stages
> > + self.stage_offs = stage_offs # each CalcStage needs *absolute* idx
> > SimpleHandshake.__init__(self, self) # pipeline is its own stage
> > self.m1o = self.ospec()
> >
> > @@ -54,11 +55,12 @@ class FPDivStagesSetup(FPState, SimpleHandshake):
> > divstages.append(DivPipeSetupStage(self.pspec))
> >
> > # here is where the intermediary stages are added.
> > - # n_stages is adjusted (in pipeline.py), reduced to take
> > - # into account the extra processing that self.begin and self.end
> > - # will add.
> > + # n_stages is adjusted (by pipeline.py), reduced to take
> > + # into account extra processing that FPDivStage0Mod and DivPipeSetup
> > + # might add.
> > for count in range(self.n_stages): # number of combinatorial stages
> > - divstages.append(DivPipeCalculateStage(self.pspec, count))
> > + idx = count + self.stage_offs
> > + divstages.append(DivPipeCalculateStage(self.pspec, idx))
> >
> >
> > On Sun, Jul 21, 2019 at 4:20 PM Luke Kenneth Casson Leighton
> > <lkcl at lkcl.net> wrote:
> > >
> > > this is the *last* piece of the puzzle - where quotient_root gets
> > > dropped into the mantissa of z (where, previously the z mantissa field
> > > has *not* been touched, at all, in *any* of the previous pipeline
> > > stages), and the "remainder" is simply used to calculate the sticky
> > > bit.
> > >
> > > now, this *might* require instead that the MSB of the remainder goes
> > > into "guard", and MSB-1 goes into "round", and remainder[:-2].bool()
> > > goes into the sticky, i honestly don't know, and i don't know if it
> > > even matters.
> > >
> > > what the code below does, is it assumes that there are 2 extra bits on
> > > the mantissa, one is "guard" and the other is "round". anything
> > > "spare" at the beginning of the quotient_root (if it wasn't "designed"
> > > to be *exactly* the right length) will be used for "sticky".
> > >
> > > this isn't perfect, by any means: it's just to give you the general idea, ok?
> > >
> > > however, the point is to illustrate that there *really is* no need to
> > > do any "extra development", i *really have* laid the groundwork
> > > already and it *really is* just a simple matter of connecting things
> > > together.
> > >
> > > no need to do exponent shifting, no need to do de-normalisation, no
> > > need to do re-normalisation, alignment, conversion: nothing. the
> > > *only* thing(s) needed are to sort out the operator, the conversion of
> > > pspec into config, and so on.
> > >
> > > l.
> > >
> > >
> > > index 8db281a..9e36cb2 100644
> > > --- a/src/ieee754/fpdiv/div2.py
> > > +++ b/src/ieee754/fpdiv/div2.py
> > > @@ -21,8 +21,7 @@ class FPDivStage2Mod(FPState, Elaboratable):
> > > self.o = self.ospec()
> > >
> > > def ispec(self):
> > > - # TODO: DivPipeCoreInterstageData
> > > - return FPDivStage0Data(self.pspec) # Q/Rem in...
> > > + return DivPipeOutputData(self.pspec) # Q/Rem in...
> > >
> > > def ospec(self):
> > > # XXX REQUIRED. MUST NOT BE CHANGED. this is the format
> > > @@ -58,11 +57,11 @@ class FPDivStage2Mod(FPState, Elaboratable):
> > > with m.If(~self.i.out_do_z):
> > > mw = self.o.z.m_width
> > > m.d.comb += [
> > > - self.o.z.m.eq(self.i.product[mw+2:]),
> > > - self.o.of.m0.eq(self.i.product[mw+2]),
> > > - self.o.of.guard.eq(self.i.product[mw+1]),
> > > - self.o.of.round_bit.eq(self.i.product[mw]),
> > > - self.o.of.sticky.eq(self.i.product[0:mw].bool())
> > > + self.o.z.m.eq(self.i.quotient_root[mw+2:]),
> > > + self.o.of.m0.eq(self.i.quotient_root[mw+2]), # copy of LSB
> > > + self.o.of.guard.eq(self.i.quotient_root[mw+1]),
> > > + self.o.of.round_bit.eq(self.i.quotient_root[mw]),
> > > + self.o.of.sticky.eq(Cat(self.i.remainder,
> > > + self.i.quotient_root[:mw]).bool())
> > >
> > > ]
> > >
> > > m.d.comb += self.o.out_do_z.eq(self.i.out_do_z)
> > >
> > > On Sun, Jul 21, 2019 at 3:56 PM Luke Kenneth Casson Leighton
> > > <lkcl at lkcl.net> wrote:
> > > >
> > > > forgot to add an FPNumBaseRecord (the result). z will be used (back
> > > > in FPDivStage0Mod) to carry the sign and exponent right the way
> > > > through the DivPipe* pipeline. not DivPipeCore* pipeline classes,
> > > > because those handle the *mantissa*. DivPipeBaseData, by having an
> > > > FPNumBaseRecord, carries the sign and exponent (and the member
> > > > variable "m" gets ignored).
> > > >
> > > > it's... okay. z.m, by never being used, should get optimised out.
> > > >
> > > > @@ -28,6 +28,9 @@ class DivPipeConfig:
> > > > class DivPipeBaseData:
> > > > """ input data base type for ``DivPipe``.
> > > >
> > > > + :attribute z: a convenient way to carry the sign and exponent through
> > > > + the pipeline from when they were computed right at the
> > > > + start.
> > > > :attribute out_do_z: FIXME: document
> > > > :attribute oz: FIXME: document
> > > > :attribute ctx: FIXME: document
> > > > @@ -41,6 +44,7 @@ class DivPipeBaseData:
> > > > """ Create a ``DivPipeBaseData`` instance. """
> > > > self.config = config
> > > > width = config.pspec.width
> > > > + self.z = FPNumBaseRecord(width, False) # s and e carried: m ignored
> > > > self.out_do_z = Signal(reset_less=True)
> > > > self.oz = Signal(width, reset_less=True)
More information about the libre-riscv-dev
mailing list