[Libre-soc-isa] [Bug 213] SimpleV Standard writeup needed

Tue Oct 20 02:29:30 BST 2020

https://bugs.libre-soc.org/show_bug.cgi?id=213

--- Comment #79 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #76)
> (In reply to Jacob Lifshay from comment #74)
> > (In reply to Jacob Lifshay from comment #73)
> > > (In reply to Luke Kenneth Casson Leighton from comment #71)
> > > > 
> > > > we _could_ conceivably do bit-level DM subdivision onto 64 bit integer regs
> > > > but... no, please, no :)  it makes a mess of the "Register Cache" idea,
> > > > unfortunately.
> > > 
> > > it would totally work, those
> > 
> > oops forgot to finish:
> > 
> > all we need to do is treat those two mask-optimized int regs as separate
> > from the rest -- kinda like CTR is treated differently than the int regs.
> 
> i get it.  they still need bit-level DM tracking, and if they are designated
> as int regs as well it becomes even more hell, at the point where they
> interact with "real" int regs.
> 
> if they are treated as completely separate regs (SPRs in effect) they need
> their own opcodes, and i really don't want to go down that route.

all that's needed is the logic to decode the register field as-if it's part of
the opcode field, and only for the few bitwise ops optimized for masks
(and/andc/or) when the output reg and an input reg are mask regs -- otherwise
it's decoded as a normal int operation and just has to wait for all bits to be
ready just like any other op.

Everything else can just access them through the port on the mask regs
connected to the integer data path. dependencies can be handled by setting
multiple dependency bits for the relevent regs being set when the register
number matches, rather than the 1 dependency bit that would be set for any
normal integer reg.

> > > > 
> > > > whereas CRs we have the freedom *to* decide how many we want to extend it to.
> > 
> > the set of integer registers optimized for masking can be extended too,
> 
> please understand: it really is too complex to track the dependencies for
> something that specialised.
> 
> > without needing all the mess of CRs.
> 
> think it through, jacob: vector CMPs and standard Rc=1 vectorised operations
> *still require a minimum 64 CRs anyway*

No they don't, part of the idea I'm proposing is that CRs *aren't* vectorized.
Rc=1 is reinterpreted to mean something different when the operation is
vectorized, probably to write a single CR with a bit indicating if all results
for the entire vector are zero, other bits' can be decided later.

> or we abandon CRs pretty much entirely for vectors and this is not an option
> i am happy to consider, it will cause havoc for gcc and llvm conpiler
> developers.

Trust me, it really won't cause havoc. GCC and LLVM both target architectures
that use 1-bit per element/lane for mask vectors: AVX512 and AMDGPU (and
probably more). In fact, in LLVM IR, vector compare operations return a 1-bit
per element/lane vector. Anything else has to use platform-specific intrinsics
or have special conversion code added in LLVM's backend.
In LLVM, vector operations are treated as a totally different kind of operation
than scalar operations all the way through the compiler, having the vector
operations have different semantics (e.g. input/output registers) than the
scalar operations won't cause any problems. In fact, I'd guess that having
vectorized CRs would cause more havoc since it's more unusual with vectors
(nothing else has that that I've ever heard of).

-- 
You are receiving this mail because:
You are on the CC list for the bug.