[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Wed Oct 6 12:16:54 BST 2021
https://bugs.libre-soc.org/show_bug.cgi?id=713
--- Comment #12 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jacob Lifshay from comment #11)
> (In reply to Luke Kenneth Casson Leighton from comment #8)
> > (In reply to Jacob Lifshay from comment #6)
> > > I already wrote code to fully determine the layout needed for partitioned
> > > signals including padding (SimdLayout),
> >
> > ok good, because this moves out of the area originally for which PS
> > was designed.
> >
> > > All that is needed is to adapt code so SimdLayout and SimdPartMode work with
> > > PartitionedSignal.
> >
> > excellent, that eould be great.
> >
> > i am having a bit of a hard time understanding SimdLayout,
> > etc., can you cut the type information, all "casting", they are
> > extraneous lines, and add some docstrings explaining the
> > purpose of the main functions?
> >
> > my feeling is, it should be possible to return an object
> > from the standard override of Value.shape() and use that.
> > most cases shape() is used to determine the signedness.
> > if that doesn't pan out it can be adapted.
>
> I think we should use:
> class PartitionedSignal(...):
> ...
> def __init__(self, layout=1, *, ...):
> self.layout = SimdLayout.cast(layout)
> ...
> shape = None
>
> since shape is limited by nmigen to just Shape(width,signedness) and nothing
> else.
> >
> > a way to fit with PartitionPoints is needed: an adapter
> > that creates the dict that goes straight into Partition
> > Points yet also can be used to return the list from
> > which the case statements can be constrcted, will find
> > an example
>
> well, unfortunately, PartitionPoints just can't really express the required
> padding info:
> assuming a layout of:
> SimdLayout({
> 1: unsigned(10),
> 2: unsigned(12),
> 4: unsigned(15),
> })
> which ends up with a 40-bit signal with lanes like so:
> bit #: 30v 20v 10v 0
> [---u10--][---u10--][---u10--][---u10--]
> [ppppppp----u12----][ppppppp----u12----]
> [pppppppppppppppppppppppp------u15-----]
>
> There are 2 possible approaches for emulating PartitionPoints, neither good:
> 1. if we have the padding as separate partitions,
yes. this is the simplest option that does not involve throwing away tested
code. it is an incremental nondisruptive approach.
>then the created circuits
> will be unnecessarily complex due to excessive partitions with partition
> points at bits:
> 10, 12, 15, 20, 30, and 32. This gets significantly worse for 8-part layouts.
not if the combinations (assuming a switch) are limited, and not if an
additional "assignment" mask is added which, *incrementally and one at a time*,
each of the submodules is evaluated and adapted.
"inefficient but functional" is a crucial thing right now because this needs
to be done FAST because it's holding up further development (critical path).
yet at the same time over-engineering is going to bite us.
> 2. if we have padding included in their partitions, which gives partition
> points of 10, 20, and 30, then many of the existing operators will calculate
> the wrong results since they failed to account for padding.
that will only happen if the starting points are not set, as a requirement,
to be on an "aligned" boundary (exactly like byte/word alignment
for software).
i covered this in comment #2
> > but first please can you take out type info and add docstrings
> > so i can see what is going on with SimdLane
>
> yeah, i'll add docs. the type info *is* part of the docs, just put in a
> standardized spot where tools can read it.
honestly they're a nuisance. massive over-engineering, resulting
in so much cruft to "manage the existence of the type" it at least
triples the amount of code, end result *harder* to read, not easier.
the example there *actually* has 7 breakpoints, can use PartitionPoints
to do it, but simply needs an "adapter" function (pair of adapter
functions, total about 60 lines to do it.
the adapter function takes just the 3 options shown as inputs (which
will go into the switch) and outputs *only* 3 possible cases setting
the right combination of 7 partition bits.
the "unused" bits in between although they calculate results those
results are *ignored* or not wired in or out, but even if not
ignored they do no harm.
*LATER* we zero them out, and even later we add individual clock
gating on some of them if they're partly used. but we can't do clock
gating yet because we don't have a clock gating cell in the cell library.
it's fully compatible with the existing code, nondisruptive, and has the
advantage that same sized same spec signals can be straight-wired, no
messing about with partition-aware muxing just to connect a dann signal.
> after all, someone writing code
> needs to be able to quickly know what types a function expects, not have to
> look through 10(!) different files following a chain of nested function
> calls just to figure out how to call a particular function. That actually
> happened to me several times when I was trying to write the div pipe,
> whereas if they had docs that specified what types they expected, i would
> have saved a bunch of time (10s
> instead of 10min).
chains like that is pretty standard fare for a big project.
tips: keep the files open, on-screen,
in multiple terminals, persistently, as a working set.
i use a 6x4 virtual desktop and move to a different view, leaving the
persistent xterms open. last record was 300 days (power cut), 3,500
vim sessions in 100 xterms on 15 virtual destkops.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list