[Libre-soc-bugs] [Bug 1044] SVP64 implementation of pow(x,y,z)
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Fri Oct 6 08:38:29 BST 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1044
--- Comment #27 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #24)
> I wrote the powmod assembly, and got a python equivalent working
hooraay! that's really good. can estimate the number of assembler
instructions now
> however,
> because divmod is really slow, running one powmod invocation is a few
> hundred thousand simulated instructions (powmod calls divmod 256-512 times,
> divmod loops 256 times, each inner loop is ~20 ops).
but there is at least a first version, unit tests can be done.
> Therefore, I think I should implement Knuth's algorithm D instead of the
> shift & subtract divmod algorithm we currently have, since I expect it to
> run 20-50x faster.
yes i concur. nice thing is, that is a fixed (small) task.
can i ask you a favour: make it "adaptable" i.e. parameteriseable
as to what input output and temp regs are used, as well as their
dimensions? (default to the ones you are using)?
or, if that is a lot of work, just specify the size (bitlength)
of X and Y?
the reason i ask is that it becomes possible to crank down the
sizes to run in sane times.
we do *not* want people running tests to find that it is 300 minutes
to completion, when if we put this in front of a customer for a demo
(which is actually going to happen) it is ok to say "this is a 128 bit
powmod, it completes quick but you get the idea".
192 bits is.... 3 64-bit GPRs which is more than enough to show
carry-rollover. am tempted to suggest 128 but it is only 2 GPRs,
i feel happier with 3 (like in the bigint shift/mul/div tests i wrote)
> I pushed to the programmerjake/powmod branch.
irony: this is leaf-node work so a branch is unnecessary unless you
prefer it.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list