[libre-riscv-dev] partitioned signals and Cat

Mon Feb 10 20:13:40 GMT 2020

On Mon, Feb 10, 2020 at 7:19 PM Jacob Lifshay <programmerjake at gmail.com> wrote:

> On Sat, Feb 8, 2020, 15:15 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > well, dang, i just realised that with Cat and changing the size of Signals
> > in general by fixed amounts, partitioning got *seriously* complicated.
> >
> > take for example the seemingly innocuous task of extending a Signal by 2
> > bits using Cat(x, 0, 0)
> >
> > this is really straightforward to assign to another Signal that is 2 bits
> > longer.  length += 2.
> >
> > for a PartitionedSignal that goes out the window, because if the partitions
> > are open, it's a 64 + 2 bit Signal, and if the partitions are closed, it's
> > 8x 8+2 bit Signals.
> >
> > some serious thought is going to have to go into this.  possibly to the
> > extent of mirroring the nmigen ArrayProxy infrastructure.
> >
> > any ideas?
> >
>
> Require signals combined using Cat to have the same partition order
> (partition point enable flags, but not partition sizes). it would work like
> this:
>
> numbers in [ ] are partition sizes, letters are single bits:
> all partitions split:
> Cat([4, 4, 8], [1, 1, 1]) => [5, 5, 9]
> bits (msb to lsb): Cat(abcdefgh_ijkl_mnop, q_r_s) => qabcdefgh_rijkl_smnop
>
> first two partitions combined:
> Cat([4 + 4, 8], [1 + 1, 1]) => [10, 9]
> bits (msb to lsb): Cat(abcdefgh_ijklmnop, q_rs) => qabcdefgh_rsijklmnop
>
> all partitions combined:
> Cat([4 + 4 + 8], [1 + 1 + 1]) => [19]
> bits (msb to lsb): Cat(abcdefghijklmnop, qrs) => qrsabcdefghijklmnop

uh-uh.  take a fixed number and Cat it to a dynamic-partitioned one.

Cat(x, 0, 0) will result in:

Cat(x[31..24], 0,0) and
Cat(x[23..16], 0, 0) and
Cat(x[15..8], 0, 0) and
Cat(x[7..0], 0, 0)

when the 4-gate partitions on a 32-bit number are open and

Cat(x[31..16], 0,0) and
Cat(x[15..0], 0, 0)

when the middle of the 4-gate partitions on a 32-bit number are open and

... you get the idea.

in other words the output length *varies* dependent on the partition
setting at the time.

it would be necessary to compute the maximum possible size *in
advance*, allocate the bits required (statically) then start routing
around that in some absolutely awful ways that would generate vaaast
amounts of multiplexer gates.

this hadn't occurred to me - at all - that this would happen.

even what you describe above is awful enough, and that's a completely
different case which i hadn't considered.

as if the auto-generated multiplexing wasn't bad enough, because we
have isolated the stages into separate classes that are connected by
registers, it becomes difficult to pass that multiplexing information
from one stage to another (without some really awful complex code or a
massive rewrite).

at the end of which, my instincts are telling me that even with that,
Cat() is going to increase the amount of gates so much that we are
wasting our time even considering trying to support Cat(), and we
might as well just do separate SIMD FP ALUs.

that is not to say that Dynamic partitioning is not *itself* a
complete waste of time: it would work perfectly well for INT
dynamic-SIMD... because that doesn't involve Cat() anywhere.

it's the partial-selection of the mantissa and exponent, followed by
"bit-expansion" (lengthening the mantissa by 3-4 bits and lengthening
the exponent likewise), where those are *dynamically-variable-length*,
that is going to get us into a world of pain.

the alternative: hard-code Cat() through some rather awful "if
partition_bits == 0b111111 elif partition_bits == 0b000010000 elif
elif elif" which, given the extent to which Cat() is used
(fpcommon.py, roundz.py, normalise.py) i really, really don't think
that's a good idea.

simplest option: do SIMD FP ALUs.  2xFP32, 4xFP16.

l.