[libre-riscv-dev] multiplier 8x8 products

Luke Kenneth Casson Leighton lkcl at lkcl.net
Fri Aug 23 23:11:51 BST 2019


On Fri, Aug 23, 2019 at 10:26 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Fri, Aug 23, 2019, 14:12 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > Hi Jacob,
> >
> > The partitioned multiplier hits every cycle with a massive 64 8x8
> > multiplies. Can you think of a way to reduce that?
> >
> one part that will reduce that is that multiplies are commutative, so that
> reduces to 32 8x8 multiplies.

using the vedic system of multiply (and assuming base 256), you have:
* 1 multiply for the first 8 bits (base 256, bits 0-7)
* 2 multiplies that go in the next "base 256" column (bits 8-15)
* 3 for the next
* 4 for the next
* 5
* 6
* 7
* 8
* (now it goes back down again) 7 for bits 64-71
* 6, 5, 4, 3, 2, 1

so that comes to... err... 64 :)  1+2+3+4+5+6+7=28, 28*2+8=64

how does it reduce to 32? :)

> > Thoughts?
> >
>
> I think you may be too fixated on early-out: I would guess that the initial
> 8x8 multiplies take up around half of the multiplier delay and the adders
> afterwards take the other half.

eek - that much? wow.  ok.  or, that little (relatively speaking).

> For all but 8x8 multiplies, I think we'll
> end up taking 2 clock cycles and the 8x8 multiplies might fit in 1 clock
> cycle.

hmmm hm hm... ok that's doable.  1 cycle for the MULs, 1 cycle for the ADDs.

> For all the other cases, early-out would vastly increase gate count
> without being able to output the results much earlier (less than a clock
> cycle).

ok - i thought it would be like 4 cycles.  so it'll be 4 with the
high-speed option, 2 without.

> Also, each additional early-out adds lots of additional signalling
> required for routing the output and control signals.

it does.

> Even for 8x8 multiplies, trailing additions to handle signed/unsigned are
> required.

yehyeh



More information about the libre-riscv-dev mailing list