[Libre-soc-bugs] [Bug 1155] O(n^2) multiplication REMAP mode(s)
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Fri Dec 22 09:50:15 GMT 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1155
--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #10)
> (In reply to Luke Kenneth Casson Leighton from comment #9)
> > (In reply to Jacob Lifshay from comment #7)
> > > I came up with a somewhat different bigint mul remap algorithm, it has a few
> > > issues though...I also rebased the branch on latest master.
> > >
> > > commit a3c1930de7990c8babcb3908ed7650e1d08eafb6
> > > Author: Jacob Lifshay <programmerjake at gmail.com>
> > > Date: Thu Dec 21 21:10:11 2023 -0800
> > >
> > > tests/bigint/powmod: initial version of bigint multiply remap
> > >
> > > it has some issues around being able to encode scalar RS but
> > > vector RT, and doesn't match the scalar * vector multiplication
> > > pattern, but is quite compact.
> >
> > i was expecting / anticipating use of vertical-first, this is
> > different / cool, how does it work?
>
> it is vertical first.
ahhh
> basically, I realized that, if you do it in the right sequence, you can
> compute a[:] * b[:] by just computing each partial product a[ai] * b[bi] and
> adding each one in the right place in the final result:
>
> y = [0] * ...
> for ai, bi in right_sequence():
> partial_product = a[ai] * b[bi]
> y += partial_product << (word_size * (ai + bi))
... assuming y is a massive long "thing" of bitsize wordsize*(ai+bi)
but the subdivisions are over wordsize boundaries. i start to get it. neat.
> this becomes a vertical first loop:
> loop:
> sv.maddedu *y, *a, *b, *y # RS is stored in scalar t
> sv.adde *(y + 1), *(y + 1), t
> svstep.
> sv.bdnz ... loop
briiilliant. i love it! that's the kind of laughably-short assembler
we need :)
> well, notice it builds the list of remap indexes, and then the loop that
> actually operates on the data just goes through those indexes in sequence,
see diff below, i split out those two roles explicitly separately,
as that's the next step in the code-morph sequence.
> doing (combination python and sv asm syntax):
> *y, t = maddedu(*a, *b, *y)
> *(y + 1), ca = adde(*(y + 1), t, ca)
love it.
> so the "yielder" (if you mean the code that produces remap indexes) is just
> the loops building the indexes, with:
> some_list.append(the_index)
>
> replaced with:
> yield the_index.
yes. done.
https://git.libre-soc.org/?p=openpower-isa.git;a=commitdiff;h=aa6f10d1f2
that's how all the ISACaller REMAP yields work, but if there's
no yielder module (e.g. remapyield.py) then ISACaller can't *have* that
functionality and we can't do the tests :)
okaay next task now we have the actual algorithm python_mul_remap_yielder(a_sz,
b_sz)
is to work out how it's going to fit into SVSHAPE0-3.
that process is started at comment #2
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list