[libre-riscv-dev] [Bug 60] N-stage 64-bit multiplier pipeline needed (signed/unsigned)
bugzilla-daemon at libre-riscv.org
bugzilla-daemon at libre-riscv.org
Fri May 17 09:22:19 BST 2019
http://bugs.libre-riscv.org/show_bug.cgi?id=60
--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #7)
> I implemented the multiplier, it passes all tests.
fantastic.
> I don't have any tests for having pipeline registers in the multiplier yet.
ok, so this is for multi-stage multiply?
> I also didn't yet add the adder input for FMA, that should be relatively
> trivial.
yes.
> I also need to add additional documentation.
>
> The multiplier supports every combination of aligned 8/16/32/64 that fits in
> 64-bits. For each part, it supports all 4 RISC-V multiply ops: mul, mulh,
> mulhsu, and mulhu.
excellent. it likely qualifies as "completed milestone". we can move
more extensive unit tests to a separate milestone.
> Review and comments welcome.
the formatting is a little... weird.
it's generally good practice to use vertical alignment to make it easier
for the reader to spot errors and differences vertically:
lanes = [SIMDMulLane(True,
True,
8,
True),
SIMDMulLane(False,
False,
8,
True),
SIMDMulLane(True,
True,
16,
False),
SIMDMulLane(True,
False,
32,
True)]
is almost impossible to see what the differences are between the 4 cases.
lanes = [SIMDMulLane(True, True, 8, True),
SIMDMulLane(False, False, 8, True),
SIMDMulLane(True, True, 16, False),
SIMDMulLane(True, False, 32, True)]
is trivial to see what the differences are
lanes = [SIMDMulLane(True, True, 8, True),
SIMDMulLane(False, False, 8, True),
SIMDMulLane(True, True, 16, False),
SIMDMulLane(True, False, 32, True)]
is extremely clear what the differences between the four cases are.
----
def async_process() -> AsyncProcessGenerator:
for a_signed in False, True:
is so heavily indented (due to it being a nested function) that again,
readability is compromised.
yield from self.subtest_mul8_16_32_64([SIMDMulLane(False,
False,
32,
True),
moving it to a global function would reduce the indentation by a whopping
3 levels, allowing the code size to be hugely reduced in the process
----
all of these:
self._intermediate_output = Signal(128)
self._part_8 = [Signal(name=f"_part_8_{i}") for i in range(8)]
self._part_16 = [Signal(name=f"_part_16_{i}") for i in range(4)]
self._part_32 = [Signal(name=f"_part_32_{i}") for i in range(2)]
self._part_64 = Signal()
self._output_64 = Signal(64)
self._output_32 = Signal(64)
self._output_16 = Signal(64)
self._output_8 = Signal(64)
self._a_signed = [Signal(name=f"_a_signed_{i}") for i in range(8)]
self._b_signed = [Signal(name=f"_b_signed_{i}") for i in range(8)]
self._not_a_term_8 = Signal(128)
self._neg_lsb_a_term_8 = Signal(128)
self._not_b_term_8 = Signal(128)
self._neg_lsb_b_term_8 = Signal(128)
self._not_a_term_16 = Signal(128)
self._neg_lsb_a_term_16 = Signal(128)
self._not_b_term_16 = Signal(128)
self._neg_lsb_b_term_16 = Signal(128)
self._not_a_term_32 = Signal(128)
self._neg_lsb_a_term_32 = Signal(128)
self._not_b_term_32 = Signal(128)
self._neg_lsb_b_term_32 = Signal(128)
self._not_a_term_64 = Signal(128)
self._neg_lsb_a_term_64 = Signal(128)
self._not_b_term_64 = Signal(128)
self._neg_lsb_b_term_64 = Signal(128)
can be moved *into* the elaborate function, the self. removed *and* the
underscore.
they're entirely local to the function named elaborate: therefore there
is absolutely no need to expose them (at all) to the public API.
that they're in the __init__ means that it is almost impossible to
work out what the inputs and outputs of the function are.
unless there is an intention to derive from Mul8_16_32_64 at some point
in the future?
commenting and grouping the inputs and outputs is pretty essential.
# are these inputs, are they outputs? or intermediaries?
# it's impossible to tell
self.part_pts = PartitionPoints()
for i in range(8, 64, 8):
self.part_pts[i] = Signal(name=f"part_pts_{i}")
self.part_op = [Signal(2, name=f"part_op_{i}") for i in range(8)]
# input operand
self.a = Signal(64)
self.b = Signal(64)
# output operand
self.output = Signal(64)
----
add_term is small enough (in this case) to keep as a locally-scoped function.
the use of typing however is still interfering with readability, by forcing
the unnecessary single-line-per-function-argument that most python developers
are used to.
-----
def __init__(self, partition_points: PartitionPointsIn = {}):
super().__init__()
no! it is *absolutely essential* never to declare a dictionary as an
argument to a function!
this is the pattern for creating a *singleton*!
any modification of that argument, partition_points, will result in *all*
uses of that class seeing the *modified* partition_points!
this is down to dictionaries being modifiable. if it was a tuple () it
would be ok, because tuples are not modifiable after creation.
replace with this:
def __init__(self, partition_points: PartitionPointsIn = None):
super().__init__()
if partition_points is None:
partition_points = {}
----
for not_a_term, \
neg_lsb_a_term, \
not_b_term, \
neg_lsb_b_term, \
parts in [
(self._not_a_term_8,
self._neg_lsb_a_term_8,
self._not_b_term_8,
self._neg_lsb_b_term_8,
self._part_8),
(self._not_a_term_16,
self._neg_lsb_a_term_16,
self._not_b_term_16,
self._neg_lsb_b_term_16,
self._part_16),
(self._not_a_term_32,
self._neg_lsb_a_term_32,
self._not_b_term_32,
self._neg_lsb_b_term_32,
self._part_32),
(self._not_a_term_64,
self._neg_lsb_a_term_64,
self._not_b_term_64,
self._neg_lsb_b_term_64,
[self._part_64]),
]:
yuck :)
and unfortunately, the names of those variables are too long (even when
self._ is removed) to include on one line.
reducing them to "notaterm_8" or removing the word "term" - not_a_8,
neg_lsb_a_32, etc. would allow that to happen.
this would improve readability greatly and reduce code length (again,
increasing readability)
---- (continuing from above)
the fact that they are "terms" may be commented above the block
(which has a comment missing explaining what the for-loop is for)
# something about this for-loop, which is impossible to
# determine from the contents of the loop itself as it's
# too code-dense, which deliberately includes the word "term"
# to make it clear that the 4 things are "terms"
for not_a, neg_lsb_a, not_b, neg_lsb_b, parts in [
(not_a_8, neg_lsb_a_8, not_b_8, neg_lsb_b_8, part_8),
....
....
----
generally, there are comments missing which explain each code block
and allow the reader to understand what is going on. in particular,
any code-heavily-dense block it is essential to have an introductory
explanation as to what that block is for.
----
groups = self.full_adder_groups()
if len(groups) == 0:
if len(self.inputs) == 0:
m.d.comb += self.output.eq(0)
elif len(self.inputs) == 1:
m.d.comb += self.output.eq(self._resized_inputs[0])
else:
assert len(self.inputs) == 2
adder = PartitionedAdder(len(self.output),
self._reg_partition_points)
m.submodules.final_adder = adder
m.d.comb += adder.a.eq(self._resized_inputs[0])
m.d.comb += adder.b.eq(self._resized_inputs[1])
m.d.comb += self.output.eq(adder.output)
return m
a comment is needed to divide this block from the code above it.
also, why does the module return early if len(groups) == 0?
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-riscv-dev
mailing list