[Libre-soc-bugs] [Bug 1044] SVP64 implementation of pow(x,y,z)
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Tue Oct 10 05:26:26 BST 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1044
--- Comment #44 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #42)
> (In reply to Jacob Lifshay from comment #39)
> > (In reply to Luke Kenneth Casson Leighton from comment #37)
> > > but just going straight to something inefficient (such as the
> > > loop-unrolled mul256 algorithm you wrote, although it gets us
> > > one incremental step ahead), this is *not* satisfying the conditions
> > > of the grant.
> >
> > which definition of efficiency are you using?
>
> the one that meets customer requirements which i repeated many times:
> top priority on code size. number of regs second.
ok.
>
> it is down to the hardware to merge VF and HF elements into
> "issue batches". which is here repeatedly everyone including
> you keeps assuming VF is incapable of doing that "thrrefore it
> musy be inefficient performance wise".
I was basing my efficiency claims on both:
* the complexity I expect will be required to get a vertical-first divmod to
work at all. I fully expect it to take *more* (and more complex) instructions
than the horizontal-first version, because afaict it doesn't cleanly map to VF
mode. this is bad for both code size and power and probably performance.
* it will most likely require lots of dynamic predicates (more than just 1<<r3)
with *large* amounts of bits that are zeros, this inherently is rather
inefficient from a performance perspective, because I'm assuming either:
* the predicate will have to be handed to the decode pipe
before the predicated operations can be issued. this is
bad for performance because you're forced to stall the
entire fetch/decode pipe for several cycles while waiting
for the predicate to be computed.
* the predicate is not known at decode/issue time, so the
full set of element operations are issued, potentially
blocking issue queues, only to later find out that most
of them were wasted. this is bad for both power and
performance.
the predicate not being known at issue time also means
that propagating results to registers and/or any following
instructions is also blocked for any instructions that
use twin-predication, since the cpu needs to wait until
it knows which registers to write to.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list