[libre-riscv-dev] [Bug 60] N-stage 64-bit multiplier pipeline needed (signed/unsigned)

Fri May 17 09:22:19 BST 2019

http://bugs.libre-riscv.org/show_bug.cgi?id=60

--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #7)
> I implemented the multiplier, it passes all tests.

 fantastic.

> I don't have any tests for having pipeline registers in the multiplier yet.

 ok, so this is for multi-stage multiply?

> I also didn't yet add the adder input for FMA, that should be relatively
> trivial.

 yes.

> I also need to add additional documentation.
> 
> The multiplier supports every combination of aligned 8/16/32/64 that fits in
> 64-bits. For each part, it supports all 4 RISC-V multiply ops: mul, mulh,
> mulhsu, and mulhu.

 excellent.  it likely qualifies as "completed milestone".  we can move
 more extensive unit tests to a separate milestone.

> Review and comments welcome.

 the formatting is a little... weird.

 it's generally good practice to use vertical alignment to make it easier
 for the reader to spot errors and differences vertically:

        lanes = [SIMDMulLane(True,
                             True,
                             8,
                             True),
                 SIMDMulLane(False,
                             False,
                             8,
                             True),
                 SIMDMulLane(True,
                             True,
                             16,
                             False),
                 SIMDMulLane(True,
                             False,
                             32,
                             True)]

 is almost impossible to see what the differences are between the 4 cases.

        lanes = [SIMDMulLane(True, True, 8, True),
                 SIMDMulLane(False, False, 8, True),
                 SIMDMulLane(True, True, 16, False),
                 SIMDMulLane(True, False, 32, True)]

 is trivial to see what the differences are

        lanes = [SIMDMulLane(True,  True,  8,  True),
                 SIMDMulLane(False, False, 8,  True),
                 SIMDMulLane(True,  True,  16, False),
                 SIMDMulLane(True,  False, 32, True)]

 is extremely clear what the differences between the four cases are.

----

            def async_process() -> AsyncProcessGenerator:
                for a_signed in False, True:

 is so heavily indented (due to it being a nested function) that again,
 readability is compromised.

                yield from self.subtest_mul8_16_32_64([SIMDMulLane(False,
                                                                   False,
                                                                   32,
                                                                   True),

 moving it to a global function would reduce the indentation by a whopping
 3 levels, allowing the code size to be hugely reduced in the process

----

 all of these:

        self._intermediate_output = Signal(128)
        self._part_8 = [Signal(name=f"_part_8_{i}") for i in range(8)]
        self._part_16 = [Signal(name=f"_part_16_{i}") for i in range(4)]
        self._part_32 = [Signal(name=f"_part_32_{i}") for i in range(2)]
        self._part_64 = Signal()
        self._output_64 = Signal(64)
        self._output_32 = Signal(64)
        self._output_16 = Signal(64)
        self._output_8 = Signal(64)
        self._a_signed = [Signal(name=f"_a_signed_{i}") for i in range(8)]
        self._b_signed = [Signal(name=f"_b_signed_{i}") for i in range(8)]
        self._not_a_term_8 = Signal(128)
        self._neg_lsb_a_term_8 = Signal(128)
        self._not_b_term_8 = Signal(128)
        self._neg_lsb_b_term_8 = Signal(128)
        self._not_a_term_16 = Signal(128)
        self._neg_lsb_a_term_16 = Signal(128)
        self._not_b_term_16 = Signal(128)
        self._neg_lsb_b_term_16 = Signal(128)
        self._not_a_term_32 = Signal(128)
        self._neg_lsb_a_term_32 = Signal(128)
        self._not_b_term_32 = Signal(128)
        self._neg_lsb_b_term_32 = Signal(128)
        self._not_a_term_64 = Signal(128)
        self._neg_lsb_a_term_64 = Signal(128)
        self._not_b_term_64 = Signal(128)
        self._neg_lsb_b_term_64 = Signal(128)

 can be moved *into* the elaborate function, the self. removed *and* the
 underscore.

 they're entirely local to the function named elaborate: therefore there
 is absolutely no need to expose them (at all) to the public API.

 that they're in the __init__ means that it is almost impossible to
 work out what the inputs and outputs of the function are.

 unless there is an intention to derive from Mul8_16_32_64 at some point
 in the future?

 commenting and grouping the inputs and outputs is pretty essential.

        # are these inputs, are they outputs? or intermediaries?
        # it's impossible to tell
        self.part_pts = PartitionPoints()
        for i in range(8, 64, 8):
            self.part_pts[i] = Signal(name=f"part_pts_{i}")
        self.part_op = [Signal(2, name=f"part_op_{i}") for i in range(8)]

        # input operand
        self.a = Signal(64)
        self.b = Signal(64)

        # output operand
        self.output = Signal(64)

----

 add_term is small enough (in this case) to keep as a locally-scoped function.
 the use of typing however is still interfering with readability, by forcing
 the unnecessary single-line-per-function-argument that most python developers
 are used to.

-----

    def __init__(self, partition_points: PartitionPointsIn = {}):
        super().__init__()

 no!  it is *absolutely essential* never to declare a dictionary as an
 argument to a function!

 this is the pattern for creating a *singleton*!

 any modification of that argument, partition_points, will result in *all*
 uses of that class seeing the *modified* partition_points!

 this is down to dictionaries being modifiable.  if it was a tuple () it
 would be ok, because tuples are not modifiable after creation.

 replace with this:

    def __init__(self, partition_points: PartitionPointsIn = None):
        super().__init__()
        if partition_points is None:
            partition_points = {}

----

        for not_a_term, \
            neg_lsb_a_term, \
            not_b_term, \
            neg_lsb_b_term, \
            parts in [
                (self._not_a_term_8, 
                 self._neg_lsb_a_term_8,
                 self._not_b_term_8,
                 self._neg_lsb_b_term_8,
                 self._part_8),
                (self._not_a_term_16,
                 self._neg_lsb_a_term_16,
                 self._not_b_term_16,
                 self._neg_lsb_b_term_16,
                 self._part_16),
                (self._not_a_term_32,
                 self._neg_lsb_a_term_32,
                 self._not_b_term_32,
                 self._neg_lsb_b_term_32,
                 self._part_32),
                (self._not_a_term_64,
                 self._neg_lsb_a_term_64,
                 self._not_b_term_64,
                 self._neg_lsb_b_term_64,
                 [self._part_64]),
                ]:

 yuck :)

 and unfortunately, the names of those variables are too long (even when
 self._ is removed) to include on one line.

 reducing them to "notaterm_8"  or removing the word "term" - not_a_8,
 neg_lsb_a_32, etc. would allow that to happen.

 this would improve readability greatly and reduce code length (again,
 increasing readability)

---- (continuing from above)

 the fact that they are "terms" may be commented above the block
 (which has a comment missing explaining what the for-loop is for)

        # something about this for-loop, which is impossible to
        # determine from the contents of the loop itself as it's
        # too code-dense, which deliberately includes the word "term"
        # to make it clear that the 4 things are "terms"

        for not_a, neg_lsb_a, not_b, neg_lsb_b, parts in [
           (not_a_8, neg_lsb_a_8, not_b_8, neg_lsb_b_8, part_8),
           ....
           ....

----

 generally, there are comments missing which explain each code block
 and allow the reader to understand what is going on.  in particular,
 any code-heavily-dense block it is essential to have an introductory
 explanation as to what that block is for.

----

        groups = self.full_adder_groups()
        if len(groups) == 0:
            if len(self.inputs) == 0:
                m.d.comb += self.output.eq(0)
            elif len(self.inputs) == 1:
                m.d.comb += self.output.eq(self._resized_inputs[0])
            else:
                assert len(self.inputs) == 2
                adder = PartitionedAdder(len(self.output),
                                         self._reg_partition_points)
                m.submodules.final_adder = adder
                m.d.comb += adder.a.eq(self._resized_inputs[0])
                m.d.comb += adder.b.eq(self._resized_inputs[1])
                m.d.comb += self.output.eq(adder.output)
            return m

 a comment is needed to divide this block from the code above it.
 also, why does the module return early if len(groups) == 0?

-- 
You are receiving this mail because:
You are on the CC list for the bug.