[Libre-soc-bugs] [Bug 713] PartitionedSignal enhancement to add partition-context-aware lengths
bugzilla-daemon at libre-soc.org
bugzilla-daemon at libre-soc.org
Wed Oct 6 19:32:52 BST 2021
https://bugs.libre-soc.org/show_bug.cgi?id=713
--- Comment #16 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #14)
> (In reply to Jacob Lifshay from comment #10)
> > (In reply to Luke Kenneth Casson Leighton from comment #9)
>
> > > this suggests a dict as the spec. mantissa:
> > >
> > > { 0b00 : (64, 53), # FP64
> > > 0b01 : (32, 23), # FP32x2
> > > 0b10 : (16, 10), # FP16
> > > 0b11 : (16, 5), # BF16
> > > }
> > >
> > > no types, no classes.
> >
> > Oh, what I have as types that can be SimdLayout.cast (like nmigen
> > Shape.cast) are 1 of 3 options:
>
> 3 options when one will do and can be covered by a dict
> is massive complication and overengineering.
internally, it always uses a dict mapping abstract-lane-sizes (pow2 integers)
to nmigen Shape instances. the other options (only as inputs to
SimdLayout.cast/.__init__) are there because it saves a bunch of code on the
caller's side, making it much easier to read, and because accepting a Shape
directly (option 3) instead of always requiring a dict makes it waay easier to
use all our existing non-simd code mostly unmodified.
>
> the *actual* needs come from a 2-bit elwidth, where at each
> elwidth i would have said even the power-2 is implicitly
> understood, if it wasn't for the fact that in FP we specify
> BF16 as one of the options.
elwidth ends up filling the SimdPartMode.part_starts bits (like original
PartitionPoints, except indexing abstract parts rather than bits), SimdLayout
uses those Signals to select which possible lanes are enabled, based on the
internal dict mapping abstract-lane-sizes to nmigen Shape instances.
The current SimdPartMode/SimdLayout design still supports doing stuff like 4xu8
1xu32 all simultaneously, because, iirc, we are still planning on supporting
packing ops into simd alus at the 32-bit level, and we could have a 4xu8 op
followed by a 1xu32 op. I initially thought about having an enum as the "mask"
signal (like your proposing here) instead of abstracted partition-points, but I
rejected that idea because of the sheer number of enumerants needed to express
the desired 4xu8 with 1xu32 combinations.
> int elwidths:
>
> 0b00 64
> 0b01 32
> 0b10 16
> 0b11 8
>
> FP:
>
> 0b00 FP64
> 0b01 FP32
> 0b10 FP16
> 0b11 BF16 16 bit not 8
FP16/BF16 -- didn't think of that, will require either signalling outside of
SimdLayout or redesigning SimdLayout to cope.
>
> that puts the "aligned power 2" width (or the number of power2 partitions)
> on the requirements, and it can be the first of the tuple under each key.
>
> the second requirement is the *useful* width at each elwidth, nonpow2
> sized.
>
> there are no other requirements, because supporting different signed/unsigned
> is out of the question.
I partially disagree, uniform signed/unsigned is needed and supported by
SimdLayout by having input lanes' types be nmigen Shape (or castable via
Shape.cast). non-uniform is unnecessarily complicated and SimdLayout will raise
AssertionError for that case.
> > 1. dict-like types that map lane-size-in-abstract-parts to Shape-castable
> > values:
> > This (after Shape.cast-ing all dict values) is the canonical internal
> > representation of a layout (stored in self.lane_shapes).
> > example:
> > { # keys are always powers of 2
> > 1: 5, # 1-part lanes are u5
> > 2: unsigned(3), # 2-part lanes are u3
> > 4: range(3, 25), # 4-part lanes are u5 since that fits 24
> > 8: MyEnum, # 8-part lanes are u10, assuming MyEnum fits in u10
> > }
>
> we need neither Enums nor range, and *definitely* signed/unsigned is
> out of the question.
all of those are converted to Shape instances by nmigen's Shape.cast (called by
SimdLayout's constructor). once constructed, the internal fields use only
nmigen Shape instances.
Signed/Unsigned is needed because we need to support signed/unsigned multiply,
signed/unsigned compare, signed/unsigned divide (as a SIMD ALU, not as a
PartitionedSignal op), signed/unsigned right shift, etc.
Oh, i just realized ALU-level signed/unsigned (separate from lane-level
signedness) is another thing that needs to go in the SimdPartMode key, along
with F16/BF16/etc.
so the key would be like:
class MyIntKey(Enum):
U8 = ...
I8 = ...
U16 = ...
I16 = ...
U32 = ...
I32 = ...
U64 = ...
I64 = ...
INT_WIDTH_IN_PARTS = {
MyIntKey.U8: 1,
MyIntKey.I8: 1,
MyIntKey.U16: 2,
MyIntKey.I16: 2,
MyIntKey.U32: 4,
MyIntKey.I32: 4,
MyIntKey.U64: 8,
MyIntKey.I64: 8,
}
class MyFpKey(Enum):
F16 = ...
BF16 = ...
F32 = ...
F64 = ...
FP_WIDTH_IN_PARTS = {
MyFpKey.F16: 1,
MyFpKey.BF16: 1,
MyFpKey.F32: 2,
MyFpKey.F64: 4,
}
> even if using this type of specification, how does it relate to
> elwidths?
elwidths determine which lane-size-in-abstract-parts is used.
>
> > or:
> > { # keys are always powers of 2
> > 1: signed(1), # 1-part lanes are i1
> > 2: signed(3), # 2-part lanes are i3
> > 4: range(-30, 25), # 4-part lanes are i6 since that fits -30
> > 8: MySignedBoolEnum, # 8-part lanes are i1
> > 16: signed(0), # 16-part lanes are i0, zero-bit shapes are supported
> > }
>
> no. range enum and signed... arrgh, these are *subtypes*?
no, they're just types that Shape.cast supports converting to Shape.
>
> no, absolutely not. no way. this is far too advanced, far too complicated.
it's easy, we let Shape.cast do all the hard work, SimdLayout just passes the
inputs to nmigen Shape.cast.
--
You are receiving this mail because:
You are on the CC list for the bug.
More information about the libre-soc-bugs
mailing list