[libre-riscv-dev] IEEE754 FPU

Sun Feb 17 08:25:13 GMT 2019

On Sun, Feb 17, 2019, 00:10 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> On Sun, Feb 17, 2019 at 8:02 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > just using another 3 quotient bits for guard, round, and sticky is
> > incorrect. the sticky bit should be set when any of the bits past guard
> and
> > round are set. an alternative description: if the remainder non-zero,
> after
> > getting the guard and round bits, then the sticky bit is set, otherwise,
> > the sticky bit is zero.
>
>             # ******
>             # Fourth stage of divide.
>
>             with m.State("divide_3"):
>                 m.next = "normalise_1"
>                 m.d.sync += [
>                     z.m.eq(div.quot[3:]),
>                     of.guard.eq(div.quot[2]),
>                     of.round_bit.eq(div.quot[1]),
>                     of.sticky.eq(div.quot[0] | (div.rem != 0))
>                 ]
>
> does that look about right?
>
yeah, looks correct to me, though you can make the divider have one less
quotient bit and just use div.rem != 0 for the sticky bit. You'd have to
shift the indexes on div.quot down 1 bit as well.

may be best to leave the algorithm unchanged since this is just for demo
purposes (from what I recall).

>
> the adder, rather than do that, it uses the multi-cycle
> (single-bit-downshift) shifting of the mantissa/exponent as a way to
> "accumulate" sticky bits:
>
>                     m.d.sync += [
>                         z.m.eq(tot[4:]),
>                         of.guard.eq(tot[3]),
>                         of.round_bit.eq(tot[2]),
>                         of.sticky.eq(tot[1] | tot[0]),
>                         z.e.eq(z.e + 1)
>
> normalisation does something similar: the sticky bits accumulate like this:
>
>         return [self.e.eq(self.e + 1),
>                 self.m.eq(Cat(self.m[0] | self.m[1], self.m[2:], 0))
>                ]
>

that's correct as far as I can tell.

>
>
> > The round and sticky bits basically allows you to select between these
> > cases:
> > Rounding 42.###...
> > 00: 42 exactly
> > 01: more than 42 but less than 42.5
> > 10: 42.5 exactly
> > 11: more than 42.5 but less than 43
>
note that this is for rounding to an integer.

>
>  cool.  i'm... relieved that you understand it (for me, it's going in
> by a process of osmosis)
>
> > The preceding all applies analogously to the other FP operations.
>
>  i have no idea how to do tininess or set odd/even rounding modes -
> will definitely need your help, there.
>
ok. rounding modes are easy once you have the normalized number and guard,
round, and sticky bits. will need to do more research on tininess.

>
>  i'm ok at this type of reverse-engineering-without-understanding,
> however it turns out that i'm pretty much dyslexic when it comes to
> algorithms.  i often substitute brute-force search through
> permutations of and/or and off-by-one in combination with
> seriously-comprehensive unit tests for "understanding" :)
>

for the final fpus, I'm going to try to use a theorem prover to prove the
nmigen/verilog to be correct; no pentium-fdiv bugs for me!

Jacob