[Libre-soc-bugs] [Bug 1157] Implement poly1305
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Fri Oct 13 18:41:15 BST 2023
https://bugs.libre-soc.org/show_bug.cgi?id=1157
--- Comment #32 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Sadoon Albader from comment #31)
> My current brainstorming for the initial h[x]+= block:
>
> 1- store 0xfffff... (twice) and 0x3ffff... in three registers (p0,p1,p2)
yep
> 2- store h0, h1, and h2 in 3 registers
> 3- split t0 and t1 into 3 registers:
> t0_s = t0
> t1_s = t0 >> 44 | t1 << 20
> t2_s = t1 >> 24 | hibit (this might be wrong, need to check)
hibit is "1<<40 if final else 0"
you can use sv.dsrd here with some "morphing". notice
how 44+20 == 64 and also 24+40 (from the hibit shift)
is also == 64?
therefore if you do:
* t2 = 1
* set an array of [44,24] in RB
* point RA at t0
* point RT at t1_s
you have created t1_s with one vector instruction.
now, of course, it looks pointless because you have one
instruction to setvl=2 then sv.dsrd is another, you might
as well have done two dsrd instructions!
so try that first ok?
theeen.... ha ha you'll like this: try with *three* dsrd instructions,
but use an array [0,44,24] in RB!
> 4- setvl 3? 9?
2, starting at t1_s and going to t2_s. but if you do the
trick above of RB=[0,44,24] and start at t0_s then you get
the "t0_s = t0 >> 0 | something" , arrange something to contain
zero and voila, no need to do a manual copy (no addi t0_s,t0,0)
> 5- the final run should be something like this:
>
> h_s[x] += t[x]_s & p[x]
t_s[x] but yes
> Which is perfectly doable in two or a few more SVP64 lines if I'm now
> understanding things correctly.
yes it is. if you stick to Horizontal-First then annoyingly it
has to be done as:
# first HF instruction, sv.and
for x in range(VL)
tmpregs[x] = t_s[x] & p[x]
# 2nd HF instruction, sv.add
for x in range(VL)
h_s[x] += tmpregs[x]
which given the wasted extra vector of tmpregs is precisely why Vertical
First was invented, as that becomes:
for x in range(VL)
tmpreg = t_s[x] & p[x]
h_s[x] += tmpreg
note tmpREG scalar rather than tmpREGS vector, but you need a loop
and an svstep. instruction which is more instructions, so you only
use VF if the regfile is under pressure or you just want to show off :)
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list