[Libre-soc-bugs] [Bug 865] implement vector bitmanip opcodes
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Sat Jun 25 09:37:33 BST 2022
https://bugs.libre-soc.org/show_bug.cgi?id=865
--- Comment #23 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #22)
> > + a1 = RA if mode&1 else ~RA
>
> that's bitwise-not, not neg --
yes. that's directly from the pseudocode explressions, which took me
a while to stop, it's so similar in small fonts.
https://en.m.wikipedia.org/wiki/X86_Bit_manipulation_instruction_set#TBM_(Trailing_Bit_Manipulation)
XOP.LZ.09 01 /1 BLCFILL Fill from lowest clear bit x & (x + 1)
XOP.LZ.09 02 /6 BLCI Isolate lowest clear bit x | ~(x + 1)
XOP.LZ.09 01 /5 BLCIC Isolate lowest clear bit and complement ~x & (x
+ 1)
XOP.LZ.09 02 /1 BLCMSK Mask from lowest clear bit x ^ (x + 1)
XOP.LZ.09 01 /3 BLCS Set lowest clear bit x | (x + 1)
XOP.LZ.09 01 /2 BLSFILL Fill from lowest set bit x | (x - 1)
XOP.LZ.09 01 /6 BLSIC Isolate lowest set bit n compl. ~x | (x - 1)
XOP.LZ.09 01 /7 T1MSKC Inverse mask from trailing ones ~x | (x + 1)
XOP.LZ.09 01 /4 TZMSK Mask from trailing zeros ~x & (x - 1)
and, further up, BMI1
VEX.LZ.0F38 F3 /3 BLSI Extract lowest set isolated bit x & -x
VEX.LZ.0F38 F3 /2 BLSMSK Get mask up to lowest set bit x ^ (x - 1)
VEX.LZ.0F38 F3 /1 BLSR Reset lowest set bit x & (x - 1)
so this separates out 3 expression groups:
1. x / ~x - this is a1
2. & / ^ / | - this is mode3
3. -x / x-1 / x+1 / ~(x+1) - this is a2
however, on top of that, to get the same set-before-first, set-only-first
and set-including-first effect, an *additional* mask is added.
> I get you point anyway...
so relieved you can interpret fuzzy-logic :)
> The idea is that, currently add/subf/etc. are basically:
>
> a = ~RA if subtracting else RA
> carry_in = 0
> if subtracting:
> carry_in = 1
> RT = a + RB + carry_in
(and an output-invert)
if inverted_out:
RT = ~RT
> bmask would (ignoring mask and bit-reverse and shifting) do:
mask is quite important (critical to include), and also i found
it... difficult to work out (sotto voice, i had to guess, and
eventually found it)
> a = ~RA if imm & 0b1 else RA
> b = 1 if imm & 0b10 else -1 # mode2
> carry_in = 0
> y = a + b + carry_in
ok so this calculates expression (3) is that correct? (with some
of the equivalence-conversions (~RA)+1 i believe it is)
> v00 = 0
> v01 = v10 = bool(imm & 0b100)
> v11 = bool(imm & 0b1000)
ahh, a LUT2... it looks like... it's doing and/or/xor. so that's expression (2)
> # 64x 4-in muxes -- basically a binlog operation:
> # probably saves gates over muxing over and, or, and xor
> table = [v00, v01, v10, v11]
> RT = 0
> for i in range(64):
> ra_bit = bool(RA & (1 << i))
> y_bit = bool(y & (1 << i))
> RT |= table[(ra_bit << 1) | y_bit] << i
and the ra input here is not expression (1) which is where the equivalence
chain falls over for me.
i *suspect* that if an extra bit for output-inversion is included then
that might work
as above:
v00 = 0
v01 = v10 = bool(imm & 0b100)
v11 = bool(imm & 0b1000)
(out-inversion built-in to LUT2?)
v00 ^= bool(imm^0b10000)
v01 ^= bool(imm^0b10000)
v10 ^= bool(imm^0b10000)
v11 ^= bool(imm^0b10000)
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list