[Libre-soc-isa] [Bug 560] big-endian little-endian SV regfile layout idea

Tue Jan 5 05:13:52 GMT 2021

https://bugs.libre-soc.org/show_bug.cgi?id=560

--- Comment #71 from Cole Poirier <colepoirier at gmail.com> ---
(In reply to Alexandre Oliva from comment #65)
> now let's look at the implications of byte endianness for vectors of bytes,
> shall we?
> 
> v8qi x = { [0] = 0x21, [1] = 0x22, [2] = 0x23, [3] = 0x24, [4] = 0x25, [5] =
> 0x26, [6] = 0x27, [7] = 0x28 };
> 
> regardless of endianness, ((char*)&x)[0] == 0x21, and ((char*)&x)[7] == 0x28
> same as if we had a char[8]:
> 
> char y[8] = { [0] = 0x21, [1] = 0x22, [2] = 0x23, [3] = 0x24, [4] = 0x25,
> [5] = 0x26, [6] = 0x27, [7] = 0x28 };
> 
> the expectation is that if you memcpy between an array and a vector, and
> vice-versa, elements remain in the same order.  a different memory
> representation for vectors would break this.
> 
> 
> the normal expectation is that, if data is laid out in memory in accordance
> with the selected endianness, it is loaded from memory into registers using
> regular loads, rather than with byte-reversing loads, right?
> 
> 
> if we load the byte vector element-wise, we get regX[0] = 0x21 and regX[7] =
> 0x28, *regardless* of how we lay out the elements within the register.
> 
> this is good, and is the best you can do if data is not more aligned than
> strictly needed, or if vector sizes might be clamped at less than 8
> byte-sized elements.
> 
> 
> but from a performance standpoint, loading the bytes separately is quite
> inefficient.  it would be far more desirable to be able to load dwords
> rather than bytes, since that's 8 bytes per effective instruction, instead
> of just 1.
> 
> so, what does it take to get the iteration within the register to enable
> wide memory loads in the CPU/system-selected endianness?
> 
> in LE, the x above gets loaded by ld as 0x2827262524232221, as in comment 64
> so, in order for the iteration order to match the declaration, element 0 is
> at bits 2^0..2^7, element 1 is at bits 2^8..2^{15}, and so on.
> 
> in BE, however, the x above gets loaded by ld as 0x2122232425262728, as in
> comment 60, so, in order for the iteration order to match the declaration,
> element 0 has to be at bits 2^{56}..2^{63}, element 1 at bits
> 2^{48}..2^{55}, and so on.
> 
> this is not reversing anything, not introducing any computations anywhere,
> it's just making the svp64 loop iterate over multiple elements in the same
> register in the same order you'd go over them if they were in memory.
> 
> that's all am I suggesting.  not any of the other modifications you're
> alluding to and apparently freaking out about.  can you please describe the
> change I'm suggesting, in your own words, so that we can make sure we're not
> talking way past each other any more?

Luke, can you please respond exclusively to Alexandre’s above comment? You are
addressing something that is not what he has written here and until this
comment is addressed in isolation and point by point you will continue to
misunderstand Alexandre.

-- 
You are receiving this mail because:
You are on the CC list for the bug.