[libre-riscv-dev] [Bug 60] N-stage 64-bit multiplier pipeline needed (signed/unsigned)
    bugzilla-daemon at libre-riscv.org 
    bugzilla-daemon at libre-riscv.org
       
    Fri May 17 09:22:19 BST 2019
    
    
  
http://bugs.libre-riscv.org/show_bug.cgi?id=60
--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #7)
> I implemented the multiplier, it passes all tests.
 fantastic.
> I don't have any tests for having pipeline registers in the multiplier yet.
 ok, so this is for multi-stage multiply?
> I also didn't yet add the adder input for FMA, that should be relatively
> trivial.
 yes.
> I also need to add additional documentation.
> 
> The multiplier supports every combination of aligned 8/16/32/64 that fits in
> 64-bits. For each part, it supports all 4 RISC-V multiply ops: mul, mulh,
> mulhsu, and mulhu.
 excellent.  it likely qualifies as "completed milestone".  we can move
 more extensive unit tests to a separate milestone.
> Review and comments welcome.
 the formatting is a little... weird.
 it's generally good practice to use vertical alignment to make it easier
 for the reader to spot errors and differences vertically:
        lanes = [SIMDMulLane(True,
                             True,
                             8,
                             True),
                 SIMDMulLane(False,
                             False,
                             8,
                             True),
                 SIMDMulLane(True,
                             True,
                             16,
                             False),
                 SIMDMulLane(True,
                             False,
                             32,
                             True)]
 is almost impossible to see what the differences are between the 4 cases.
        lanes = [SIMDMulLane(True, True, 8, True),
                 SIMDMulLane(False, False, 8, True),
                 SIMDMulLane(True, True, 16, False),
                 SIMDMulLane(True, False, 32, True)]
 is trivial to see what the differences are
        lanes = [SIMDMulLane(True,  True,  8,  True),
                 SIMDMulLane(False, False, 8,  True),
                 SIMDMulLane(True,  True,  16, False),
                 SIMDMulLane(True,  False, 32, True)]
 is extremely clear what the differences between the four cases are.
----
            def async_process() -> AsyncProcessGenerator:
                for a_signed in False, True:
 is so heavily indented (due to it being a nested function) that again,
 readability is compromised.
                yield from self.subtest_mul8_16_32_64([SIMDMulLane(False,
                                                                   False,
                                                                   32,
                                                                   True),
 moving it to a global function would reduce the indentation by a whopping
 3 levels, allowing the code size to be hugely reduced in the process
----
 all of these:
        self._intermediate_output = Signal(128)
        self._part_8 = [Signal(name=f"_part_8_{i}") for i in range(8)]
        self._part_16 = [Signal(name=f"_part_16_{i}") for i in range(4)]
        self._part_32 = [Signal(name=f"_part_32_{i}") for i in range(2)]
        self._part_64 = Signal()
        self._output_64 = Signal(64)
        self._output_32 = Signal(64)
        self._output_16 = Signal(64)
        self._output_8 = Signal(64)
        self._a_signed = [Signal(name=f"_a_signed_{i}") for i in range(8)]
        self._b_signed = [Signal(name=f"_b_signed_{i}") for i in range(8)]
        self._not_a_term_8 = Signal(128)
        self._neg_lsb_a_term_8 = Signal(128)
        self._not_b_term_8 = Signal(128)
        self._neg_lsb_b_term_8 = Signal(128)
        self._not_a_term_16 = Signal(128)
        self._neg_lsb_a_term_16 = Signal(128)
        self._not_b_term_16 = Signal(128)
        self._neg_lsb_b_term_16 = Signal(128)
        self._not_a_term_32 = Signal(128)
        self._neg_lsb_a_term_32 = Signal(128)
        self._not_b_term_32 = Signal(128)
        self._neg_lsb_b_term_32 = Signal(128)
        self._not_a_term_64 = Signal(128)
        self._neg_lsb_a_term_64 = Signal(128)
        self._not_b_term_64 = Signal(128)
        self._neg_lsb_b_term_64 = Signal(128)
 can be moved *into* the elaborate function, the self. removed *and* the
 underscore.
 they're entirely local to the function named elaborate: therefore there
 is absolutely no need to expose them (at all) to the public API.
 that they're in the __init__ means that it is almost impossible to
 work out what the inputs and outputs of the function are.
 unless there is an intention to derive from Mul8_16_32_64 at some point
 in the future?
 commenting and grouping the inputs and outputs is pretty essential.
        # are these inputs, are they outputs? or intermediaries?
        # it's impossible to tell
        self.part_pts = PartitionPoints()
        for i in range(8, 64, 8):
            self.part_pts[i] = Signal(name=f"part_pts_{i}")
        self.part_op = [Signal(2, name=f"part_op_{i}") for i in range(8)]
        # input operand
        self.a = Signal(64)
        self.b = Signal(64)
        # output operand
        self.output = Signal(64)
----
 add_term is small enough (in this case) to keep as a locally-scoped function.
 the use of typing however is still interfering with readability, by forcing
 the unnecessary single-line-per-function-argument that most python developers
 are used to.
-----
    def __init__(self, partition_points: PartitionPointsIn = {}):
        super().__init__()
 no!  it is *absolutely essential* never to declare a dictionary as an
 argument to a function!
 this is the pattern for creating a *singleton*!
 any modification of that argument, partition_points, will result in *all*
 uses of that class seeing the *modified* partition_points!
 this is down to dictionaries being modifiable.  if it was a tuple () it
 would be ok, because tuples are not modifiable after creation.
 replace with this:
    def __init__(self, partition_points: PartitionPointsIn = None):
        super().__init__()
        if partition_points is None:
            partition_points = {}
----
        for not_a_term, \
            neg_lsb_a_term, \
            not_b_term, \
            neg_lsb_b_term, \
            parts in [
                (self._not_a_term_8, 
                 self._neg_lsb_a_term_8,
                 self._not_b_term_8,
                 self._neg_lsb_b_term_8,
                 self._part_8),
                (self._not_a_term_16,
                 self._neg_lsb_a_term_16,
                 self._not_b_term_16,
                 self._neg_lsb_b_term_16,
                 self._part_16),
                (self._not_a_term_32,
                 self._neg_lsb_a_term_32,
                 self._not_b_term_32,
                 self._neg_lsb_b_term_32,
                 self._part_32),
                (self._not_a_term_64,
                 self._neg_lsb_a_term_64,
                 self._not_b_term_64,
                 self._neg_lsb_b_term_64,
                 [self._part_64]),
                ]:
 yuck :)
 and unfortunately, the names of those variables are too long (even when
 self._ is removed) to include on one line.
 reducing them to "notaterm_8"  or removing the word "term" - not_a_8,
 neg_lsb_a_32, etc. would allow that to happen.
 this would improve readability greatly and reduce code length (again,
 increasing readability)
---- (continuing from above)
 the fact that they are "terms" may be commented above the block
 (which has a comment missing explaining what the for-loop is for)
        # something about this for-loop, which is impossible to
        # determine from the contents of the loop itself as it's
        # too code-dense, which deliberately includes the word "term"
        # to make it clear that the 4 things are "terms"
        for not_a, neg_lsb_a, not_b, neg_lsb_b, parts in [
           (not_a_8, neg_lsb_a_8, not_b_8, neg_lsb_b_8, part_8),
           ....
           ....
----
 generally, there are comments missing which explain each code block
 and allow the reader to understand what is going on.  in particular,
 any code-heavily-dense block it is essential to have an introductory
 explanation as to what that block is for.
----
        groups = self.full_adder_groups()
        if len(groups) == 0:
            if len(self.inputs) == 0:
                m.d.comb += self.output.eq(0)
            elif len(self.inputs) == 1:
                m.d.comb += self.output.eq(self._resized_inputs[0])
            else:
                assert len(self.inputs) == 2
                adder = PartitionedAdder(len(self.output),
                                         self._reg_partition_points)
                m.submodules.final_adder = adder
                m.d.comb += adder.a.eq(self._resized_inputs[0])
                m.d.comb += adder.b.eq(self._resized_inputs[1])
                m.d.comb += self.output.eq(adder.output)
            return m
 a comment is needed to divide this block from the code above it.
 also, why does the module return early if len(groups) == 0?
-- 
You are receiving this mail because:
You are on the CC list for the bug.
    
    
More information about the libre-riscv-dev
mailing list