[libre-riscv-dev] [Bug 216] LOAD STORE buffer needed

Wed Mar 11 23:08:39 GMT 2020

http://bugs.libre-riscv.org/show_bug.cgi?id=216

Jacob Lifshay <programmerjake at gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |programmerjake at gmail.com

--- Comment #2 from Jacob Lifshay <programmerjake at gmail.com> ---
(In reply to Luke Kenneth Casson Leighton from comment #1)
> (In reply to Jacob Lifshay from
> http://bugs.libre-riscv.org/show_bug.cgi?id=215#c5)
> > For a scheduling algorithm for loads that are ready to run (6600-style
> > scheduler sent to load/store unit for execution, no conflicting stores
> > in-front, no memory fences in-front), we can have a queue of memory ops and
> > each cycle we pick the load at the head of the queue and then search from
> > the head to tail for additional loads that target the same cache line
> > stopping at the first memory fence, conflicting store, etc. Once those loads
> > are selected, they are removed from the queue (probably by marking them as
> > removed) and sent thru the execution pipeline.
> > 
> > We can use a similar algorithm for stores.
> 
> right. ok. thanks to Mitch Alsup, i have a Memory Dependency matrix that
> takes care of discerning and preserving the loads and stores into batches. 
> we could if we wanted to do not only TSO, it can handle cross-SMP in a way
> that makes atomic memory either entirely moot or dead trivial.  this i need
> a little more research on.
> 
> anyway the point is: LOADs as a batch are already identified and hold up any
> STOREs, and vice-versa.

Ok. The algorithm I proposed can still be used for scheduling loads/stores
inside each batch and deciding which cache line to access each cycle, since I
think the memory dependency matrix currently handles deciding which ops to
execute each cycle by relying on a slow and simple FIFO algorithm. The queue in
the algorithm I proposed can be the same HW as the memory dependency matrix, so
it would end up as a hybrid algorithm.

The recursive summary mechanism inspired by carry lookahead will still be
useful.

-- 
You are receiving this mail because:
You are on the CC list for the bug.