[Libre-soc-dev] v3.1B prefix

Luke Kenneth Casson Leighton lkcl at lkcl.net
Sat Dec 5 05:48:04 GMT 2020


hiya jacob,

i sent the message to paul, and also went through a mental walkthrough
of the combined 16/48 bit state machine, the ABI implications,
everything.

if it were possible to express, virtually, thumping a desk repeatedly
in annoyance, that's the not-reaction to imagine i am feeling.

what i am trying to say is: OpenPOWER shot itself in the foot by not
building in 16/48 bit instruction possibilities, with the 6 bits of
fixed size Major Opcodes, and i don't think this is a mistake that can
be recovered from, not without full buy-in from *all* OPF Members,
designing a modern variant of VLE that is properly signed off and a
suitable ABI agreed.

the state that has to be maintained is just too much to cross over
when interacting with standard ABI code, and in addition even crossing
over in a VLE mode is still problematic.

consequently, the implications are that, even more severely limited as
it is, the 24 bit v3.1b prefix system is... well... our only realistic
"least bad" option for now.

which also, given that 48 bit is also not practical until it has been
fully adopted by OPF Members especially IBM, leaves SV with a power
consumption penalty as all SV instructions will be 64 bit.

this i am not in the slightest bit happy about.

the *only* good news in this cluster**** is this:

that if we accept that setvl is the sole method for setting VL and
MVL, then the pressure is off when it comes to trying to jam what was
formerly in 11 bits, now there are 24 available.

this in turn means ample space for:

* 2x elwidths, 2x predicates and 2x inverter selectors for twin predication
     2 bits elwidth probably 3 per predicate
     totals 10 bits
     1 extra for choosing CR or GPR for predicate input
* SUBVL 2 bits
* 2 or even 3 bits for Vec/scalar extension
    3x3 src1 src2 dest gives 9 bits

that's 22 bits, 2 spare for other purposes.

now, what doesn't fit here is swizzle.  we need 2 sets, realistically
(src1 and src2) to not have massive quantities of f.mvs and also still
need to specify SUBVL somehow as well.

and applying swizzle src1+src2 *and* predication?  naah.

the reduced space is very irritating, given quite how much could be
compressed if we did not have to fit within these constraints.

at least 24 bits is better than 11.

l.



More information about the Libre-soc-dev mailing list