[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths

Wed Oct 6 12:16:54 BST 2021

https://bugs.libre-soc.org/show_bug.cgi?id=713

--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #11)
> (In reply to Luke Kenneth Casson Leighton from comment #8)
> > (In reply to Jacob Lifshay from comment #6)
> > > I already wrote code to fully determine the layout needed for partitioned
> > > signals including padding (SimdLayout),
> > 
> > ok good, because this moves out of the area originally for which PS
> > was designed.
> > 
> > > All that is needed is to adapt code so SimdLayout and SimdPartMode work with
> > > PartitionedSignal.
> > 
> > excellent, that eould be great.
> > 
> > i am having a bit of a hard time understanding SimdLayout,
> > etc., can you cut the type information, all "casting", they are
> > extraneous lines, and add some docstrings explaining the
> > purpose of the main functions?
> > 
> > my feeling is, it should be possible to return an object
> > from the standard override of Value.shape() and use that.
> > most cases shape() is used to determine the signedness.
> > if that doesn't pan out it can be adapted.
> 
> I think we should use:
> class PartitionedSignal(...):
>     ...
>     def __init__(self, layout=1, *, ...):
>         self.layout = SimdLayout.cast(layout)
>         ...
>     shape = None
> 
> since shape is limited by nmigen to just Shape(width,signedness) and nothing
> else.
> > 
> > a way to fit with PartitionPoints is needed: an adapter
> > that creates the dict that goes straight into Partition
> > Points yet also can be used to return the list from
> > which the case statements can be constrcted, will find
> > an example
> 
> well, unfortunately, PartitionPoints just can't really express the required
> padding info:
> assuming a layout of:
> SimdLayout({
>     1: unsigned(10),
>     2: unsigned(12),
>     4: unsigned(15),
> })
> which ends up with a 40-bit signal with lanes like so:
> bit #: 30v       20v       10v         0
> [---u10--][---u10--][---u10--][---u10--]
> [ppppppp----u12----][ppppppp----u12----]
> [pppppppppppppppppppppppp------u15-----]
> 
> There are 2 possible approaches for emulating PartitionPoints, neither good:
> 1. if we have the padding as separate partitions, 

yes. this is the simplest option that does not involve throwing away tested
code.  it is an incremental nondisruptive approach.

>then the created circuits
> will be unnecessarily complex due to excessive partitions with partition
> points at bits:
> 10, 12, 15, 20, 30, and 32. This gets significantly worse for 8-part layouts.

not if the combinations (assuming a switch) are limited, and not if an
additional "assignment" mask is added which, *incrementally and one at a time*,
each of the submodules is evaluated and adapted.

"inefficient but functional" is a crucial thing right now because this needs
to be done FAST because it's holding up further development (critical path).

yet at the same time over-engineering is going to bite us.

> 2. if we have padding included in their partitions, which gives partition
> points of 10, 20, and 30, then many of the existing operators will calculate
> the wrong results since they failed to account for padding.

that will only happen if the starting points are not set, as a requirement,
to be on an "aligned" boundary (exactly like byte/word alignment
for software).

i covered this in comment #2

> > but first please can you take out type info and add docstrings
> > so i can see what is going on with SimdLane
> 
> yeah, i'll add docs. the type info *is* part of the docs, just put in a
> standardized spot where tools can read it. 

honestly they're a nuisance.  massive over-engineering, resulting
in so much cruft to "manage the existence of the type" it at least
triples the amount of code, end result *harder* to read, not easier.

the example there *actually* has 7 breakpoints, can use PartitionPoints
to do it, but simply needs an "adapter" function (pair of adapter
functions, total about 60 lines to do it.

the adapter function takes just the 3 options shown as inputs (which
will go into the switch) and outputs *only* 3 possible cases setting
the right combination of 7 partition bits.

the "unused" bits in between although they calculate results those
results are *ignored* or not wired in or out, but even if not
ignored they do no harm.

*LATER* we zero them out, and even later we add individual clock
gating on some of them if they're partly used.  but we can't do clock
gating yet because we don't have a clock gating cell in the cell library.

it's fully compatible with the existing code, nondisruptive, and has the
advantage that same sized same spec signals can be straight-wired, no
messing about with partition-aware muxing just to connect a dann signal.

> after all, someone writing code
> needs to be able to quickly know what types a function expects, not have to
> look through 10(!) different files following a chain of nested function
> calls just to figure out how to call a particular function. That actually
> happened to me several times when I was trying to write the div pipe,
> whereas if they had docs that specified what types they expected, i would
> have saved a bunch of time (10s 
> instead of 10min).

chains like that is pretty standard fare for a big project.

tips:  keep the files open, on-screen,
in multiple terminals, persistently, as a working set.

i use a 6x4 virtual desktop and move to a different view, leaving the
persistent xterms open.  last record was 300 days (power cut),  3,500
vim sessions in 100 xterms on 15 virtual destkops.

-- 
You are receiving this mail because:
You are on the CC list for the bug.