[libre-riscv-dev] div/mod algorithm written in python

Sun Jul 21 21:09:10 BST 2019

jacob, you managed to somehow delete the modifications that i'd made,
below.  i've restored them as they're really important.  we can't have
a chain of OR gates in a sequence: it results in huge gate delay
chains.

bool() is an operator (supported by yosys) that is specially optimised
by creating an actual cell that performs a multi-input OR, in an
extremely gate-efficient fashion.  i've used it several times,
including for a multi-input AND, by negating all the inputs and also
negating the output.

l.

diff --git a/src/ieee754/div_rem_sqrt_rsqrt/core.py
b/src/ieee754/div_rem_sqrt_rsqrt/core.py
index e6a0b9b..da1be3a 100644
--- a/src/ieee754/div_rem_sqrt_rsqrt/core.py
+++ b/src/ieee754/div_rem_sqrt_rsqrt/core.py
@@ -376,16 +376,19 @@ class DivPipeCoreCalculateStage(Elaboratable):
                 bit_value ^= pass_flags[j]
             m.d.comb += next_bits.part(i, 1).eq(bit_value)

-        next_compare_rhs = 0
+        # XXX using a list to accumulate the bits and then using bool
+        # is IMPORTANT.  if done using |= it results in a chain of OR gates.
+        l = [] # next_compare_rhs
         for i in range(radix):
             next_flag = pass_flags[i + 1] if i + 1 < radix else 0
             selected = Signal(name=f"selected_{i}", reset_less=True)
             m.d.comb += selected.eq(pass_flags[i] & ~next_flag)
-            next_compare_rhs |= Mux(selected,
-                                    trial_compare_rhs_values[i],
-                                    0)
+            l.append(Mux(selected, trial_compare_rhs_values[i], 0)
+
+        # concatenate the list of Mux results together and OR them using
+        # the bool operator.
+        m.d.comb += self.o.compare_rhs.eq(Cat(*l).bool())

-        m.d.comb += self.o.compare_rhs.eq(next_compare_rhs)
         m.d.comb += self.o.root_times_radicand.eq(self.i.root_times_radicand
                                                   + ((self.i.divisor_radicand
                                                       * next_bits)

On Sun, Jul 21, 2019 at 6:50 PM Luke Kenneth Casson Leighton
<lkcl at lkcl.net> wrote:
>
> just added in a comment in FPDivStage0Mod.  basically the mantissas
> "represent" numbers in the range 0.5 to 0.999999, and they both have
> their MSB set to 1.  the rest of the mantissa can be zero or it can be
> all 1s.
>
> this means that the result (the quotient) will be *guaranteed* to be
> in the range 0.499999 to 1.99999998.  anything other than that has
> *already* been taken care of by "specialcases".
>
> now, if the subtraction of the exponents, followed by rounding, takes
> the exponent over the maximum for a FP number, that's dealt with
> *post-normalisation*.
>
> if likewise the value in the exponent (z.e) is -126, and the mantissas
> come out to 0.499999 and that results in rounding taking the exponent
> down to -127, that is *also* taken care of by post-normalisation.
>
> what i'm trying to get across is: *everything* is already taken care
> of.  the only thing - the sole thing that needs to be done - is to
> wire in the setup, intermediate and final modules (DivPipe*Stage), to
> produce the quotient_root based on the divisor and dividend.
>
> even the operator can probably be hard-wired (for now) to "unsigned int div".
>
> l.
>
>
> class FPDivStage0Mod(Elaboratable):
>
>             # the mantissas, having been de-normalised (and containing
>             # a "1" in the MSB) represent numbers in the range 0.5 to
>             # 0.9999999-recurring.  the min and max range of the
>             # result is therefore 0.4999999 (0.5/0.99999) and 1.9999998
>             # (0.99999/0.5).
>
>             m.d.comb += [self.o.z.e.eq(self.i.a.e - self.i.b.e + 1),
>                          self.o.z.s.eq(self.i.a.s ^ self.i.b.s)
>                          self.o.dividend.eq(self.i.a.m), # TODO: check
>                          self.o.divisor_radicand.eq(self.i.b.m), # TODO: check
>                          self.o.operation.eq(Const(0)) # TODO (set from ctx.op)
>
> On Sun, Jul 21, 2019 at 4:33 PM Luke Kenneth Casson Leighton
> <lkcl at lkcl.net> wrote:
> >
> > i added in the "stage_index" parameter which should now finally start
> > to make it clear for you what's going on.
> >
> > the stage_index is computed back in ieee754/fpdiv/pipeline.py, which
> > is where the "construction" takes place, and it has three modes.
> > first mode adds in a "FPDivStagesSetup" as the first StageChained
> > "thing", the others will be "FPDivStagesIntermediate"s.  pipes in the
> > middle are only made from a StageChain of intermediaries.  last one
> > has a Final at the end of the StageChain.
> >
> > the fact that the "stage_index" gets passed around should give you a
> > "handle" on how this fits together, as it's the first piece of
> > "config" data that is in place.
> >
> > l.
> >
> >  class FPDivStagesSetup(FPState, SimpleHandshake):
> >
> > -    def __init__(self, pspec, n_stages):
> > +    def __init__(self, pspec, n_stages, stage_offs):
> >          FPState.__init__(self, "divsetup")
> >          self.pspec = pspec
> >          self.n_stages = n_stages # number of combinatorial stages
> > +        self.stage_offs = stage_offs # each CalcStage needs *absolute* idx
> >          SimpleHandshake.__init__(self, self) # pipeline is its own stage
> >          self.m1o = self.ospec()
> >
> > @@ -54,11 +55,12 @@ class FPDivStagesSetup(FPState, SimpleHandshake):
> >          divstages.append(DivPipeSetupStage(self.pspec))
> >
> >          # here is where the intermediary stages are added.
> > -        # n_stages is adjusted (in pipeline.py), reduced to take
> > -        # into account the extra processing that self.begin and self.end
> > -        # will add.
> > +        # n_stages is adjusted (by pipeline.py), reduced to take
> > +        # into account extra processing that FPDivStage0Mod and DivPipeSetup
> > +        # might add.
> >          for count in range(self.n_stages): # number of combinatorial stages
> > -            divstages.append(DivPipeCalculateStage(self.pspec, count))
> > +            idx = count + self.stage_offs
> > +            divstages.append(DivPipeCalculateStage(self.pspec, idx))
> >
> >
> > On Sun, Jul 21, 2019 at 4:20 PM Luke Kenneth Casson Leighton
> > <lkcl at lkcl.net> wrote:
> > >
> > > this is the *last* piece of the puzzle - where quotient_root gets
> > > dropped into the mantissa of z (where, previously the z mantissa field
> > > has *not* been touched, at all, in *any* of the previous pipeline
> > > stages), and the "remainder" is simply used to calculate the sticky
> > > bit.
> > >
> > > now, this *might* require instead that the MSB of the remainder goes
> > > into "guard", and MSB-1 goes into "round", and remainder[:-2].bool()
> > > goes into the sticky, i honestly don't know, and i don't know if it
> > > even matters.
> > >
> > > what the code below does, is it assumes that there are 2 extra bits on
> > > the mantissa, one is "guard" and the other is "round".  anything
> > > "spare" at the beginning of the quotient_root (if it wasn't "designed"
> > > to be *exactly* the right length) will be used for "sticky".
> > >
> > > this isn't perfect, by any means: it's just to give you the general idea, ok?
> > >
> > > however, the point is to illustrate that there *really is* no need to
> > > do any "extra development", i *really have* laid the groundwork
> > > already and it *really is* just a simple matter of connecting things
> > > together.
> > >
> > > no need to do exponent shifting, no need to do de-normalisation, no
> > > need to do re-normalisation, alignment, conversion: nothing.  the
> > > *only* thing(s) needed are to sort out the operator, the conversion of
> > > pspec into config, and so on.
> > >
> > > l.
> > >
> > >
> > > index 8db281a..9e36cb2 100644
> > > --- a/src/ieee754/fpdiv/div2.py
> > > +++ b/src/ieee754/fpdiv/div2.py
> > > @@ -21,8 +21,7 @@ class FPDivStage2Mod(FPState, Elaboratable):
> > >          self.o = self.ospec()
> > >
> > >      def ispec(self):
> > > -        # TODO: DivPipeCoreInterstageData
> > > -        return FPDivStage0Data(self.pspec) # Q/Rem in...
> > > +        return DivPipeOutputData(self.pspec) # Q/Rem in...
> > >
> > >      def ospec(self):
> > >          # XXX REQUIRED.  MUST NOT BE CHANGED.  this is the format
> > > @@ -58,11 +57,11 @@ class FPDivStage2Mod(FPState, Elaboratable):
> > >          with m.If(~self.i.out_do_z):
> > >              mw = self.o.z.m_width
> > >              m.d.comb += [
> > > -                self.o.z.m.eq(self.i.product[mw+2:]),
> > > -                self.o.of.m0.eq(self.i.product[mw+2]),
> > > -                self.o.of.guard.eq(self.i.product[mw+1]),
> > > -                self.o.of.round_bit.eq(self.i.product[mw]),
> > > -                self.o.of.sticky.eq(self.i.product[0:mw].bool())
> > > +                self.o.z.m.eq(self.i.quotient_root[mw+2:]),
> > > +                self.o.of.m0.eq(self.i.quotient_root[mw+2]), # copy of LSB
> > > +                self.o.of.guard.eq(self.i.quotient_root[mw+1]),
> > > +                self.o.of.round_bit.eq(self.i.quotient_root[mw]),
> > > +                self.o.of.sticky.eq(Cat(self.i.remainder,
> > > +                                        self.i.quotient_root[:mw]).bool())
> > >
> > >              ]
> > >
> > >          m.d.comb += self.o.out_do_z.eq(self.i.out_do_z)
> > >
> > > On Sun, Jul 21, 2019 at 3:56 PM Luke Kenneth Casson Leighton
> > > <lkcl at lkcl.net> wrote:
> > > >
> > > > forgot to add an FPNumBaseRecord (the result).  z will be used (back
> > > > in FPDivStage0Mod) to carry the sign and exponent right the way
> > > > through the DivPipe* pipeline.  not DivPipeCore* pipeline classes,
> > > > because those handle the *mantissa*.  DivPipeBaseData, by having an
> > > > FPNumBaseRecord, carries the sign and exponent (and the member
> > > > variable "m" gets ignored).
> > > >
> > > > it's... okay.  z.m, by never being used, should get optimised out.
> > > >
> > > > @@ -28,6 +28,9 @@ class DivPipeConfig:
> > > >  class DivPipeBaseData:
> > > >      """ input data base type for ``DivPipe``.
> > > >
> > > > +    :attribute z: a convenient way to carry the sign and exponent through
> > > > +                  the pipeline from when they were computed right at the
> > > > +                  start.
> > > >      :attribute out_do_z: FIXME: document
> > > >      :attribute oz: FIXME: document
> > > >      :attribute ctx: FIXME: document
> > > > @@ -41,6 +44,7 @@ class DivPipeBaseData:
> > > >          """ Create a ``DivPipeBaseData`` instance. """
> > > >          self.config = config
> > > >          width = config.pspec.width
> > > > +        self.z = FPNumBaseRecord(width, False) # s and e carried: m ignored
> > > >          self.out_do_z = Signal(reset_less=True)
> > > >          self.oz = Signal(width, reset_less=True)