[libre-riscv-dev] [Bug 60] N-stage 64-bit multiplier pipeline needed (signed/unsigned)
bugzilla-daemon at libre-riscv.org
bugzilla-daemon at libre-riscv.org
Sat May 18 03:35:50 BST 2019
http://bugs.libre-riscv.org/show_bug.cgi?id=60
--- Comment #14 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #12)
> (In reply to Jacob Lifshay from comment #7)
> > I don't have any tests for having pipeline registers in the multiplier yet.
>
> ok, so this is for multi-stage multiply?
it has an input that allows you to specify which levels of the carry-save adder
tree have registers and which levels are wired straight through.
>
> > I also didn't yet add the adder input for FMA, that should be relatively
> > trivial.
>
> yes.
>
> > I also need to add additional documentation.
> >
> > The multiplier supports every combination of aligned 8/16/32/64 that fits in
> > 64-bits. For each part, it supports all 4 RISC-V multiply ops: mul, mulh,
> > mulhsu, and mulhu.
>
> excellent. it likely qualifies as "completed milestone". we can move
> more extensive unit tests to a separate milestone.
>
>
> > Review and comments welcome.
>
> the formatting is a little... weird.
>
> it's generally good practice to use vertical alignment to make it easier
> for the reader to spot errors and differences vertically:
>
> lanes = [SIMDMulLane(True,
> True,
> 8,
> True),
> SIMDMulLane(False,
> False,
> 8,
> True),
> SIMDMulLane(True,
> True,
> 16,
> False),
> SIMDMulLane(True,
> False,
> 32,
> True)]
>
> is almost impossible to see what the differences are between the 4 cases.
>
> lanes = [SIMDMulLane(True, True, 8, True),
> SIMDMulLane(False, False, 8, True),
> SIMDMulLane(True, True, 16, False),
> SIMDMulLane(True, False, 32, True)]
>
> is trivial to see what the differences are
>
> lanes = [SIMDMulLane(True, True, 8, True),
> SIMDMulLane(False, False, 8, True),
> SIMDMulLane(True, True, 16, False),
> SIMDMulLane(True, False, 32, True)]
>
> is extremely clear what the differences between the four cases are.
yeah, I'll fix that.
>
>
> ----
>
> def async_process() -> AsyncProcessGenerator:
> for a_signed in False, True:
>
> is so heavily indented (due to it being a nested function) that again,
> readability is compromised.
>
> yield from self.subtest_mul8_16_32_64([SIMDMulLane(False,
> False,
> 32,
> True),
>
> moving it to a global function would reduce the indentation by a whopping
> 3 levels, allowing the code size to be hugely reduced in the process
unfortunately it can't be a global function since it uses self from the
surrounding function but it can't take any arguments
>
>
> ----
>
> all of these:
>
> self._intermediate_output = Signal(128)
> self._part_8 = [Signal(name=f"_part_8_{i}") for i in range(8)]
...
> self._neg_lsb_b_term_64 = Signal(128)
>
> can be moved *into* the elaborate function, the self. removed *and* the
> underscore.
>
> they're entirely local to the function named elaborate: therefore there
> is absolutely no need to expose them (at all) to the public API.
They're exposed so the test framework can access them, otherwise they are
private.
>
> that they're in the __init__ means that it is almost impossible to
> work out what the inputs and outputs of the function are.
>
> unless there is an intention to derive from Mul8_16_32_64 at some point
> in the future?
>
> commenting and grouping the inputs and outputs is pretty essential.
>
> # are these inputs, are they outputs? or intermediaries?
> # it's impossible to tell
> self.part_pts = PartitionPoints()
> for i in range(8, 64, 8):
> self.part_pts[i] = Signal(name=f"part_pts_{i}")
> self.part_op = [Signal(2, name=f"part_op_{i}") for i in range(8)]
>
> # input operand
> self.a = Signal(64)
> self.b = Signal(64)
>
> # output operand
> self.output = Signal(64)
all of part_pts, part_op, a, and b are inputs, output is an output.
>
> ----
>
>
> add_term is small enough (in this case) to keep as a locally-scoped
> function.
> the use of typing however is still interfering with readability, by forcing
> the unnecessary single-line-per-function-argument that most python
> developers
> are used to.
>
>
> -----
>
> def __init__(self, partition_points: PartitionPointsIn = {}):
> super().__init__()
>
> no! it is *absolutely essential* never to declare a dictionary as an
> argument to a function!
>
> this is the pattern for creating a *singleton*!
>
I get that, however, partition_points is not modified or stored in an
attribute, so having a singleton default argument doesn't hurt and actually
might save memory, since an extra dict doesn't need to be allocated each time.
> ----
>
> for not_a_term, \
> neg_lsb_a_term, \
> not_b_term, \
> neg_lsb_b_term, \
> parts in [
> (self._not_a_term_8,
> self._neg_lsb_a_term_8,
> self._not_b_term_8,
> self._neg_lsb_b_term_8,
> self._part_8),
> (self._not_a_term_16,
> self._neg_lsb_a_term_16,
> self._not_b_term_16,
> self._neg_lsb_b_term_16,
> self._part_16),
> (self._not_a_term_32,
> self._neg_lsb_a_term_32,
> self._not_b_term_32,
> self._neg_lsb_b_term_32,
> self._part_32),
> (self._not_a_term_64,
> self._neg_lsb_a_term_64,
> self._not_b_term_64,
> self._neg_lsb_b_term_64,
> [self._part_64]),
> ]:
>
>
> yuck :)
I should probably change them to arrays or dicts, so there would be a single
self._not_a_term initialized to {8: Signal(...), 16: Signal(...), ...}.
similarly for _neg_lsb_a_term, _not_b_term, _neg_lsb_b_term, and _part (which I
might name parts)
>
> and unfortunately, the names of those variables are too long (even when
> self._ is removed) to include on one line.
>
> reducing them to "notaterm_8" or removing the word "term" - not_a_8,
> neg_lsb_a_32, etc. would allow that to happen.
>
> this would improve readability greatly and reduce code length (again,
> increasing readability)
>
>
> ---- (continuing from above)
>
> the fact that they are "terms" may be commented above the block
> (which has a comment missing explaining what the for-loop is for)
>
> # something about this for-loop, which is impossible to
> # determine from the contents of the loop itself as it's
> # too code-dense, which deliberately includes the word "term"
> # to make it clear that the 4 things are "terms"
>
> for not_a, neg_lsb_a, not_b, neg_lsb_b, parts in [
> (not_a_8, neg_lsb_a_8, not_b_8, neg_lsb_b_8, part_8),
> ....
> ....
>
> ----
>
> generally, there are comments missing which explain each code block
> and allow the reader to understand what is going on. in particular,
> any code-heavily-dense block it is essential to have an introductory
> explanation as to what that block is for.
>
yeah, I was just writing code to get it to pass the tests, rather than writing
lots of comments. I'll be fixing that.
>
> ----
>
> groups = self.full_adder_groups()
> if len(groups) == 0:
> if len(self.inputs) == 0:
> m.d.comb += self.output.eq(0)
> elif len(self.inputs) == 1:
> m.d.comb += self.output.eq(self._resized_inputs[0])
> else:
> assert len(self.inputs) == 2
> adder = PartitionedAdder(len(self.output),
> self._reg_partition_points)
> m.submodules.final_adder = adder
> m.d.comb += adder.a.eq(self._resized_inputs[0])
> m.d.comb += adder.b.eq(self._resized_inputs[1])
> m.d.comb += self.output.eq(adder.output)
> return m
>
> a comment is needed to divide this block from the code above it.
> also, why does the module return early if len(groups) == 0?
it returns early because this is the base case in a recursive module
instantiation. if it didn't, there would be a stack overflow at elaborate-time.
basically, each invocation converts the input terms to 2 * (len(inputs) // 3) +
len(inputs) % 3 terms for the next recursive call using len(inputs) // 3
carry-save adders, where len(groups) is len(inputs) // 3.
Once len(inputs) < 3, then it executes the base case, which adds the last 2
inputs together using a traditional adder.
I'll be adding that or similar to a comment.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list