[libre-riscv-dev] multiplier 8x8 products
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Fri Aug 23 23:11:51 BST 2019
On Fri, Aug 23, 2019 at 10:26 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Fri, Aug 23, 2019, 14:12 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > Hi Jacob,
> >
> > The partitioned multiplier hits every cycle with a massive 64 8x8
> > multiplies. Can you think of a way to reduce that?
> >
> one part that will reduce that is that multiplies are commutative, so that
> reduces to 32 8x8 multiplies.
using the vedic system of multiply (and assuming base 256), you have:
* 1 multiply for the first 8 bits (base 256, bits 0-7)
* 2 multiplies that go in the next "base 256" column (bits 8-15)
* 3 for the next
* 4 for the next
* 5
* 6
* 7
* 8
* (now it goes back down again) 7 for bits 64-71
* 6, 5, 4, 3, 2, 1
so that comes to... err... 64 :) 1+2+3+4+5+6+7=28, 28*2+8=64
how does it reduce to 32? :)
> > Thoughts?
> >
>
> I think you may be too fixated on early-out: I would guess that the initial
> 8x8 multiplies take up around half of the multiplier delay and the adders
> afterwards take the other half.
eek - that much? wow. ok. or, that little (relatively speaking).
> For all but 8x8 multiplies, I think we'll
> end up taking 2 clock cycles and the 8x8 multiplies might fit in 1 clock
> cycle.
hmmm hm hm... ok that's doable. 1 cycle for the MULs, 1 cycle for the ADDs.
> For all the other cases, early-out would vastly increase gate count
> without being able to output the results much earlier (less than a clock
> cycle).
ok - i thought it would be like 4 cycles. so it'll be 4 with the
high-speed option, 2 without.
> Also, each additional early-out adds lots of additional signalling
> required for routing the output and control signals.
it does.
> Even for 8x8 multiplies, trailing additions to handle signed/unsigned are
> required.
yehyeh
More information about the libre-riscv-dev
mailing list