[libre-riscv-dev] [Bug 139] Add LD.X and ST.X? Strided

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Wed Oct 9 09:09:59 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=139

--- Comment #55 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #53)
> read everything, still absorbing.  i see where you're going with the
> "type-safety".
> 
> the immediate problem i can see with it is: predication (single or twin)
> *already* makes the concept of bounds-checking runtime type-safety moot.
> predication is *already* run-time-dependent, and developers (and
> compilers) already have to "put up with" it, and make damn sure that they
> get things right.

The difference with predication is that when the compiler enables it at all, it
already knows it needs to have a valid value in dest before running the
instruction. swizzle with ignore/unmodified didn't have that.

Also, predication doesn't allow an instruction to write past the end of the
allocated registers, since the compiler always allocates for the full VL,
whereas swizzle with ignore/unmodified could write past the end due to the
compiler allocating for vec2 and the swizzle being vec4 (for example).

> 
> have to pack, ready for tomorrow, so will be a little distracted.

:)

> i do like [f]swizzle2 (rs1==rs3).  can you demonstrate *exact* equivalence
> i.e. that src *and* dest may have elements arbitrarily "skipped" i.e. not
> destroyed?
> 
> how is the following achieved?
> 
>    rd.yw = rs.zy

I'm assuming the types are both vec4.

That's achieved using fswizzle2:

// renamed variables to src and dest to not conflict
fswizzle2 rd=dest, rs1=src, rs2=swizzle, rs3=dest

where swizzle selects [rs3.x, rs1.z, rs3.z, rs1.y]

Having a separate instruction to load the swizzle constant is acceptable since
writing to a swizzle is less common than reading from a swizzle.

in this case fswizzle2 can be combined with a move and/or a previous swizzled
write assuming the previous write writes rd.xz

> 
> using twin-predication (twin "masks" - however they are called, i don't
> care if they're named "unknown" or not) - it is dead-easy.  dest mask
> equals 0b0011.

got it, though it would actually be 0b1010, since bits are counted from LSB:
0bwzyx

> 
> ah.
> 
> wait.
> 
> just listing that example, something just occurred to me.
> 
> you don't need twin-predication.  you need *single* predication, on the dest.
> 
> the reason is: the src-index-selection gives "appearance" of
> twin-predication.
> with the dest having predication, all you do is: fill in *only two* src
> swizzle indices.
> 
> *smacks-forehead*...

it all falls into place... :P

> 
> 
> > 6. Not needing a popcount (even though it's quite small) simplifies
> > instruction decode
> 
> it's not for decode, it's for exception-recovery after a context-switch.
> it is absolutely the case that popcount would *not* be needed when
> *starting* a newly-issued instruction because the offsets start at zero.

Popcount would be needed at decode since we want to be able to generate more
than 1 element operation per clock cycle (otherwise our nice 128-bit-wide ALU
will be mostly unused)

> 
> only when one of the offsets is not zero (returning from an exception)
> does the other need to be recovered.
> 
> however this i believe is now moot as i realised twin-predication is
> redundant.  can you confirm / check the logic/reasoning above?

Seems good to me.

> will go over the rest (inline) again.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list