[libre-riscv-dev] [Bug 132] SIMD-like nmigen signal for partitioning

bugzilla-daemon at libre-riscv.org bugzilla-daemon at libre-riscv.org
Thu Aug 15 04:10:00 BST 2019


http://bugs.libre-riscv.org/show_bug.cgi?id=132

--- Comment #22 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #19)
> (In reply to Jacob Lifshay from comment #18)
> > (In reply to Luke Kenneth Casson Leighton from comment #17)
> > > (In reply to Jacob Lifshay from comment #14)
> > > > (In reply to Luke Kenneth Casson Leighton from comment #12)
> > > > > (In reply to Jacob Lifshay from comment #11)
> > > > > 
> > > > > > > 
> > > > > > > If the code is not doing single cycle results we cannot use it.
> > > > > > 
> > > > > > yes we can, we just need to tell the pipeline API "this takes 3 stages
> > > > > > instead of one, so insert extra registers on the control signals"
> > > > > 
> > > > > Which still does not take care of cancellation.
> > > > 
> > > > it's a simple data pipe, if a particular element is canceled, that pipeline
> > > > slot will just be empty, just like divpipecore. the control pipeline can
> > > > keep track of which elements have valid data and which have been canceled.
> > > > 
> > > > > 
> > > > > The multiplier code will now need to implement cancellation, which is a
> > > > > global mask (not a register-propagated signal).
> > > > the surrounding control hardware will just set the associated control
> > > > signals such that the canceled/unused data elements are ignored.
> > > > 
> > > 
> > > 
> > > Which has the knock on ramifications of underutilised hardware (stages that
> > > run empty) which either decreases the IPC count or requires more RSs to
> > > conpensate.
> > 
> > it decreases IPC, which is what happens anytime an instruction is canceled,
> > the partially completed instruction used (before it was known that it was to
> > be canceled) hardware that could have been used to run other instructions
> > had it known. 
> 
> That is correct... however by leaving the slots empty there is *yet more*
> penalty added.
> 
> The reason is because the stop mask is an unary representation of the binary
> Reservation Station Index.
> 
> If the index cannot be cleared because there are three extra clock delays
> until it clears out the end of the pipeline, that is three Reservation
> Stations that cannot be used.

The index can be cleared right away and the Reservation Stations reused. All
that is needed is to have a muxid of 0 (or some other appropriate value or a
muxid_valid signal) mean that there is no instruction there so the output
should be ignored.

Since the Reservation Stations can be reused right away, the only part that is
underutilized is the canceled pipeline slots, which is unavoidable, since those
slots can't be reused due to instructions needing to go through the pipeline
stages in sequence.

> The multiply code needs the same code structure as div_core.

I agree, however, I think that the div_core is unnecessarily complicated by all
the extra wrapping for each stage and that it should be more like the
multiplier and have the data registers internally.

To allow generic registers, the constructor could have a function passed in
that creates a register for a particular stage. That would allow stuff like
passing in a function that creates registers for every other stage (to handle 2
computational stages per pipeline stage) or handles creating the muxes and
registers needed for inserting registers only when the processor is in
high-speed mode (as discussed earlier).

The whole idea is that the computation code should be simple and easy to read
without all the control signals being required to be in the same Module.

That might also have benefits if the synthesizer lays out logic according to
which module it's in, allowing all the control pipeline to be moved away from
the data pipeline when the data pipeline doesn't need to know what the control
pipeline is doing at every stage.

Note that the multiplier won't specify how many pipeline stages it can take,
since that number can vary anywhere from 1 (entirely combinatorial) to about 18
and it is totally up to the instantiating module.

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-riscv-dev mailing list