[libre-riscv-dev] [Bug 333] investigate why CR pipeline code took 100% CPU and locked up generating ILANG

Wed May 20 19:16:31 BST 2020

https://bugs.libre-soc.org/show_bug.cgi?id=333

Michael Nolan <mtnolan2640 at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |mtnolan2640 at gmail.com

--- Comment #1 from Michael Nolan <mtnolan2640 at gmail.com> ---
It seems we both stumbled on this issue at about the same time, and arrived at
similar solutions. I narrowed down the commit that introduced it to this one: 

commit 7638005f8577ba6545a65a0c6e31c5419f1f684e
Author: Luke Kenneth Casson Leighton <lkcl at lkcl.net>
Date:   Sun May 17 17:15:58 2020 +0100

    use slightly more elegant way to access CR lookup table

diff --git a/src/soc/cr/main_stage.py b/src/soc/cr/main_stage.py
index e29d5ee..23eba32 100644
--- a/src/soc/cr/main_stage.py
+++ b/src/soc/cr/main_stage.py
@@ -67,9 +67,10 @@ class CRMainStage(PipeModBase):
         # the CR to determine what the resulting bit should be.

         # Grab the lookup table for cr_op type instructions
-        lut = Signal(4, reset_less=True)
+        lut = Array([Signal(name=f"lut{i}") for i in range(4)])
         # There's no field, just have to grab it directly from the insn
-        comb += lut.eq(self.i.ctx.op.insn[6:10])
+        for i in range(4):
+            comb += lut[i].eq(self.i.ctx.op.insn[6+i])

         # Generate the mask for mtcrf, mtocrf, and mfocrf
         fxm = Signal(xfx_fields['FXM'][0:-1].shape())
@@ -108,28 +109,15 @@ class CRMainStage(PipeModBase):
                 comb += ba.eq(xl_fields['BA'][0:-1])
                 comb += bb.eq(xl_fields['BB'][0:-1])

-                # Extract the two input bits from the CR
-                bit_a = Signal(reset_less=True)
-                bit_b = Signal(reset_less=True)
-                comb += bit_a.eq(cr_arr[ba])
-                comb += bit_b.eq(cr_arr[bb])
-
-                # Use the two input bits to look up the result in the
-                # lookup table
-                bit_out = Signal(reset_less=True)
-                comb += bit_out.eq(Mux(bit_b,
-                                       Mux(bit_a, lut[3], lut[1]),
-                                       Mux(bit_a, lut[2], lut[0])))
-                # Set the output to the result above
-                comb += cr_out_arr[bt].eq(bit_out)
+                # Use the two input bits to look up the result in the LUT
+                comb += cr_out_arr[bt].eq(lut[Cat(cr_arr[bb], cr_arr[ba])])

             ##### mtcrf #####
             with m.Case(InternalOp.OP_MTCRF):
                 # mtocrf and mtcrf are essentially identical
                 # put input (RA) - mask-selected - into output CR, leave
                 # rest of CR alone.
-                comb += cr_o.eq((self.i.a[0:32] & mask) |
-                                     (self.i.cr & ~mask))
+                comb += cr_o.eq((self.i.a[0:32] & mask) | (self.i.cr & ~mask))

             ##### mfcr #####
             with m.Case(InternalOp.OP_MFCR):


I think what happens is nmigen generates something really big for arr[index].
Since that gets used to index lut, and then gets assigned to cr_out_arr[idx],
it generates a gigantic expression. By assigning the temporary expressions to
signals, this doesn't happen. Just a guess

-- 
You are receiving this mail because:
You are on the CC list for the bug.