[libre-riscv-dev] div/mod algorithm written in python
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Sun Jul 21 11:53:37 BST 2019
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
On Sun, Jul 21, 2019 at 11:02 AM Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Sat, Jul 20, 2019 at 12:55 AM Luke Kenneth Casson Leighton
> <lkcl at lkcl.net> wrote:
> > yehyeh. well, the basic routines are all there, already done: there's
> > pipeline stages already that will shift the mantissa up so that the
> > MSB is always 1 (and adjust the exponent accordingly as well), and
> > likewise on the way out.
> > so as long as the integer "thing" works, fitting it in place is
> > actually pretty trivial.
> > once the result is generated, the post-normalisation pipeline stages
> > take care of re-normalisation, so even if the mantissa (int-result)
> > doesn't have a MSB which is 1, that's *precisely* what
> > re-normalisation takes care of: shifting the MSB and adjusting the
> > exponent as well.
> > so the exponent will need to be carried through the int-div pipeline
> > stages *untouched*, ok? generated/modified by the de-normalisation,
> > carried through the int-div pipe, handed to the post/re-normalisation,
> > and dealt with there.
> One thing we will need to consider is that sqrt/rsqrt actually
> requires the mantissa to be shifted such that the exponent is even,
yep. that's easily done. the class FPMSBHigh can be adapted to
ensure that happens:
> otherwise we lose the factor of sqrt(2). I had been thinking that,
> since all normal/denormal numbers produce normal outputs for
> sqrt/rsqrt (exponent divides by 2), it would be better to have the
> exponent handling happen during the same stages that the mantissa is
> being calculated by DivCore, since that way, we don't need an extra
> stage just to handle that and it will pipeline better.
remember: with the pipeline API, the concept of "stages" does *not*
automatically mean "a clock delay". what it means is: "a convenient
way to conceptually separate code into tasks".
we then have the choice:
* do we chain the "tasks" together into a single-clock-cycle "thing"?
if so, use StageChain
* are the "tasks" too complex (too high a gate latency), if so, use
something that's derived from ControlBase to create a
clock-controllable "actual pipeline stage".
if the code is *not* separated out, we do not have that choice. it
would require a big redesign - a lot more coding effort - should we
discover, much further down the line, that the gate latency is far too
high in any one "stage".
so it's basically much more preferable to have modules that do "tasks"
- one of those tasks would be (just like the align.py code above),
"make the exponent an even number", and, if you look here:
you'll see that it's *already* assumed, in the "stack", that that's
exactly what's going to be done (matching all of the other FP code,
which follows the exact same pattern).
if that pattern is *not* going to be followed, there needs to be a
really, _really_ good reason, as it will be both confusing and also
require understanding of two totally disparate codebases that
effectively do the same job.
remember also that we have quite a lot of "code-morphing" to do
(replace all use of SimpleHandshake with a "no delay" base class that
respects "cancellation"), and having different codebases (different
methods of doing pipelines) will make that task a lot harder to
> the exponent operations would be (assuming inputs and outputs are
> biased and bias is positive):
> fdiv: nexponent - dexponent + bias (needs overflow/underflow handling)
from what i can gather, there's certain ranges that the mantissa has
to be placed into, and the result will come out "in the correct
what i've seen is, for example, in the multiply, extra bit(s) are
added to the product (1 extra bit per input mantissa). then it no
longer becomes necessary to worry about *exponent* biasing, because
the mantissa has the extra accuracy required.
that extra accuracy then results in the remainder having a few more
bits. do the normalisation, put those extra bits into
guard/round/sticky, and the job's done:
# p is product (52 - or more! - bits long)
mw = self.o.z.m_width
self.o.of.sticky.eq(p[0:mw].bool()) # sticky is all
the remaining bits
jon dawson's divider code, which passes lots of IEEE754 tests, doesn't
have any kind of exponent bias.
z_s <= a_s ^ b_s;
z_e <= a_e - b_e;
quotient <= 0;
remainder <= 0;
count <= 0;
dividend <= a_m << 27;
divisor <= b_m;
state <= divide_1;
all of quotient, remainder, dividend and divisor are *51 bit long* (!!!).
so... if bias is *really* needed, i believe the mantissa needs to be
increased in bit length (by the exponent bias).
does that sound about right?
More information about the libre-riscv-dev