[libre-riscv-dev] LD/ST address matcher

Luke Kenneth Casson Leighton lkcl at lkcl.net
Tue Jun 4 02:02:20 BST 2019


On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com> wrote:

> On Mon, Jun 3, 2019, 17:22 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > > On Mon, Jun 3, 2019, 16:58 Luke Kenneth Casson Leighton <lkcl at lkcl.net
> >
> > > wrote:
> > >
> > > > On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com>
> > > wrote:
> > > >
> > > > >
> > > > > another case we can't get around
> > > >
> > > >
> > > > The double negative is doing my head in :)
> > > >
> > > >
> > > > >  by allowing multiple simultaneous loads is
> > > > > drawing to multiple output textures simultaneously, a technique
> that
> > > > allows
> > > > > expensive lighting calculations to be done once per visible pixel
> > > instead
> > > > > of once per rendered pixel.
> > > >
> > > >
> > > > And those are again aligned on regular boundaries, of what size?
> Hmmm
> > it
> > > > would be the screen (framebuffer) size, wouldn't it, by that point?
> > > >
> > > > There is actually another potential trick: take some bits of the
> > PHYSICAL
> > > > address to check against, as well as the virtual one.
> > > >
> > > > I believe I saw this trick described somewhere in the RISCV manual.
> > > >
> > > > With the physical pages being spread out pretty much randomly
> (assuming
> > > > gigapages aren't used), the extra PHYS bits effectively constitute
> the
> > > > equivalent of a hash.
> > > >
> > > > However gigapages could well be used here, so, hmmmm
> > > >
> > > > We need something that covers bits up to say 4MiB.  That's 9 extra
> > bits -
> > > > 12 to 21. 2^20 =1M, 2^22=4M.
> > > >
> > > > So a VERY simple hash of 9 bits is viable.  Bear in mind its gate
> count
> > > > will be multiplied by 248(!) for a 4LD 4ST AGEN clash detection
> matrix.
> > > >
> > > the gate count would be multiplied by either 4 (the enqueue width; if
> we
> > > store the hashes in the ld/st queue) or by the length of the queue (8,
> I
> > > think). we would have one hash for each addresses to compare, not each
> > > address comparator. the address comparators can be (for 8x8 with 16-bit
> > > hashes) (n*(n-1))/2 16-bit comparators, which each take 16 xor gates, 4
> > > 4-input nand gates, and 1 4-input nor gate, which altogether is
> > > 28*(16*3+4*4+4)=1904 inverter equivalents (assuming 3/4/4 inverters for
> > > xor/4-in nand/4-in nor respectively).
> >
> >
> > Yes got it. 256 x 8 to give the hashes, then 7 6 5 4 3 2 1 compares @
> > bitwidth.
> >
> > That would be more sane.
> >
> > 16 bit hashes is still a bit high.  16 AND gates, times 7654321 is i
> think
> > 16 x 8x4 AND gates, which is what 16k ANDs so x4 for transistor count,
> > 128k.
> >
> assuming 4-input gates can be used for and-reduction, I had calculated the
> count above to be 1904 inverter equivalents for the 8x8 16-bit comparator
> grid (3808 transistors)


That's tolerable. Missed it.

Would still feel more comfortable with a hybrid approach, as the hash alone
will definitely miss certain entries that a straight addr[4..11] would
definitely detect.

Mitch picked 4..11 as you can see from his earlier reply because it
corresponds with cache lines.

L.



-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68


More information about the libre-riscv-dev mailing list