[Libre-soc-dev] new svp64 page

Luke Kenneth Casson Leighton lkcl at lkcl.net
Thu Dec 10 18:07:23 GMT 2020


On 12/10/20, Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
> On 12/10/20, Lauri Kasanen <cand at gmx.com> wrote:
>> On Thu, 10 Dec 2020 16:27:33 +0000
>> Luke Kenneth Casson Leighton <lkcl at lkcl.net> wrote:
>>
>>> lauri, jacob, what's your thoughts on using 2 bits for clamping mode?
>>> this is *not* the same as elwidth itself, which is the "chop" in VSX
>>> ops pseudocode.
>>>
>>> or: another idea:
>>>
>>> * extsb, extsh, extsw specify one type of width
>>> * twin predication specifies 2 more (src elwidth, dest elwidth)
>>> * 1 bit says "operation is to be clamped" (not to which range, that's
>>> implicit)
>>
>> I can't come up with a use case for having different clamping to dst
>> elwidth. If you want 8-bit unsigned saturation, there's no reason for
>> you to write that to 16-bit elements. So I would take the clamp width
>> from the dst elwidth.
>
> it's not that the elwidth has a reason (or not), it's that add and
> other arith ops *don't* have sign/uns (except for mul and div) and
> they don't have a full range of 8/16/32.
>
> now, if we allow dest elwidth even on 2-src *arithmetic* operations
> (something that was left out of SVP originally because of lack of
> space), then now the one bit "sat" (or 2 bit, one for signed one for
> unsigned) starts to gel.
>
>> I would simply have two bits to enable clamping, unsigned and signed.
>> 16 and 32 bit do need both, not just 8-bit.
>
> i realised belatedly that add does not have add-signed as separate
> from add-unsigned.  nor is there, in Power, an add8 or add16.
>
> i will see if there's space in the 24 bits for dest elwidth and 2 bits
> for sat mode.

there is.

does this look like a reasonable general-purpose algorithm, applicable
to all operations, whether exts*, mr, or 2/3 arithmetic ops?

* saturation is done on the result at the **source** elwidth
* signed-saturation causes sign-extension from source to dest elwidths
**after** saturation

in pseudocode:

    res = op(src1, ....) # 1/2/3 op
    res = clamp(+/-src_wid_NN, res)
    res = EXTS/Z(res, src_wid, dest_wid)

extsb/w/h would be one of the slightly weird ones that would need to
be thought through properly.

needs a full table.

l.



More information about the Libre-soc-dev mailing list