[Libre-soc-bugs] [Bug 558] gcc SV intrinsics concept

Fri Jan 15 15:12:33 GMT 2021

https://bugs.libre-soc.org/show_bug.cgi?id=558

--- Comment #66 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Alexandre Oliva from comment #63)
> What Jacob said.  I may have misremembered the double+double long double
> thing as float+float double.  But pairs of (neighboring) 32-bit GPRs for
> 64-bit values in 32-bit mode are definitely a thing, at least ABI-wise.

we're not doing 32 bit backwards compatibility.  i mean, we could, but not now.
 aside from anything it interacts with "elwidth=default" which is now 32 bit
not 64.

32 bit mode can therefore be disregarded as far as Vectorisation is concerned

> A big deal would be having (possibly legacy) opcodes/ABIs that require
> registers, from any register file, to be grouped and used together in a way
> that required them to be contiguous in a certain way, while having other
> (new) opcodes that require them to be grouped in ways that impose different
> contiguity requirements, because it would be impossible to satisfy both at
> the same time.

mtcr throws the spanner in the works, there.

> As for vector CC modes, as long as you use the CC modes as outputs to vector
> compares, and then use them for predication of vectors of the same length,

ohh yeh.  no it would be possible but too complex to change MVL in the middle.

leaving that aside...

> this will be (I believe) no different from existing uses of CRs as outputs
> to scalar compare insns and inputs to conditional moves

that's what i imagine, yes.  isel, setb, fsel.

but, caveat: these (isel setb) are not the full/only way and they are tricks
used (rarely) by scalar ISAs that do not have "real" predication built-in

given that ppc64 does not have predication *at all* it is x86, SVE2 and others
that may need to be examined for clues as to how this can be done.

(adding predication to VSX has been a constant feature request made to IBM for
some time)

> (and conditional
> branches, but those won't take vectors, I hope ;-)

:)  on the RV ISA list we did laugh at the idea of Vectorised Branches,
effectively this becomes a way to create hyperthreaded coroutines (!)

are we implementing that? eeeehhhno.  love the idea though.

there *will* be a "Reduce Mode" on CRVectors though which will allow:

* V-results to create V-CRs
* Crunch (mapreduce) V-CRs to scalar CR
* Scalar CR can do *Standard* scalar
  branch test

this is "equivalent" but much more powerful than VSX CR6: VL up to 64 can
perform far more parallel work, in fewer instructions, than VSX.

thus it is possible, using Vector-CRops (V-crand, V-crxor) to check the status
of a *batch* of V-results 

> GCC's "movecc_<mode>" template insn in rs6000.md covers even moving the
> various CCmodes from and to memory, through GPRs, but it's still the case
> that GCC has no clue about the bit patterns that represent the different
> values represented in these modes.  That's left entirely up to the
> architecture to choose, and implement consistently between CR-setting and
> CR-using insns.

fascinating.

> In order to use vector CRs as predicates with predefined values, rather than
> as ones computed by vector compares, or by vector insns that set vector CRs
> as side effects, the programmers who wish to use such predicates will either
> have to figure out how to get the right bit patterns loaded into the CR
> vectors, or set up other vectors and perform operations between them that
> set the vectors accordingly.

it is fairly normal to write optimised libc6 Vector/SIMD routines in assembler.
 the transfers using crweird (INT-CRVec) will be needed for strncpy, memcpy etc
where INT "set-before-first" opcodes are needed.

> Since CRs are smaller than words, it is possible that the same anomaly that
> I reported elsewhere about endianness in using bitfields as predicate
> registers will apply to CR vectors, and users will have to use integral
> values that vary depending on endianness to obtain equivalent CR vector
> predication.

reminder: both microwatt and libresoc do *not* internally perform dynamic
endianness conversion on ALUs, or on any regfiles, of any kind.  endianness is
*removed* at memory... period.

this is to preserve the HDL developers sanity.

[aside: byte-reversal on GPR-GPR interaction is now a property of REMAP that
gives the "illusion" of having LE/BE GPR regfile capability.  but that is
GPR-GPR not GPR-CR.]

transfers between CR regfile and GPR INT regfile are DIRECT and HARD coded to
one and ONLY one endianness.  we may call this LE if it helps.

transfers between CR-Vectors and INTs is **NOT** intended to be done beyond 64
bits, because VL is limited to 64 bits.  therefore AT NO TIME will there be
transfers between CR-Vectors and INT-Vectors (hence the opcode name "crweird").

thus there will neverrrr be a problem, everrr, where endianness is a
complicating factor involving CR-Vector to INT-as-predicate.

however if the developer mistakenly enables REMAP bytereversal when interacting
with the INT *after* transfer out of CR-Vector that becomes their problem to
sort out their incorrect program.

-- 
You are receiving this mail because:
You are on the CC list for the bug.