[libre-riscv-dev] [Bug 132] SIMD-like nmigen signal for partitioning

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Wed Aug 14 12:23:00 BST 2019


--- Comment #5 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #3)
> (In reply to Jacob Lifshay from comment #2)
> > See also:
> > https://salsa.debian.org/Kazan-team/simple-barrel-processor/blob/master/src/
> > multiply.py
> yehyeh, that's the one.  if we can create operators >=, add, sub, etc.
> then there's no need to explicitly confuse the entire design of a pipeline
> stage.
> what will be needed in the __init__ constructors is, instead of just
> "self.a = Signal(width)" it will be:
> self.a = PartitionedSignal(self.ctx, pspec, width)
> the reason is: the context contains (will contain) not just the SIMD
> partitioning information, it will contain the "cancellation mask"
> which will be used to cancel *subsets* of the SIMD operation rather
> than all of it.
> the only thing is: multiply is *not* going to be easy, because it's
> multi-stage.  so i think there, we would have to do something quite
> complex inside the multiply, like:
> class PartitionedSignal:
>    def __mul__(self, value):
>        if self.ctx.operator == FIRST_STAGE_MULTIPLY:
>            # do wallace multiply step 1
>        elif self.ctx.operator == SECOND_STAGE_MULTIPLY:
>            # do wallace multiply step 2
> or something like that.  needs thought.

for the simple ops, I used a fully general partitioning: see PartitionPoints's

multiply is complex enough that it only supports partitioning on byte
boundaries with the partitions being naturally-aligned power-of-2-sized

Note that I did also write a partitioned adder and all partitioned bitwise ops
(and/or/xor/not, but not shift/rotate/anything that communicates between bits)
are identical to the non-partitioned ops.

the partitioned adder should be quite easily updated to an add/subtracter.

I really think we should have a separate control pipeline that handles all the
tracking which instruction is in which partition and it just tells the data
pipeline where it's partitioned and that's it. Otherwise, we will end up having
to add more complexity to the already very complex multiplier, which would
change it from barely understandable to incomprehensible for whoever didn't
actually write it. the multiplier is designed so that it acts like a simple
pipeline where every input comes out exactly N stages later, where N is
configured by the instantiating code.

To be able to handle mul-add, the multiplier will need to expose the
_intermediate_output value, since that's twice as wide as the inputs and
contains the full multiply result.

You are receiving this mail because:
You are on the CC list for the bug.

More information about the libre-riscv-dev mailing list