[libre-riscv-dev] LD/ST address matcher

Jacob Lifshay programmerjake at gmail.com
Tue Jun 4 01:41:02 BST 2019


On Mon, Jun 3, 2019, 17:22 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> > On Mon, Jun 3, 2019, 16:58 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > wrote:
> >
> > > On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com>
> > wrote:
> > >
> > > >
> > > > another case we can't get around
> > >
> > >
> > > The double negative is doing my head in :)
> > >
> > >
> > > >  by allowing multiple simultaneous loads is
> > > > drawing to multiple output textures simultaneously, a technique that
> > > allows
> > > > expensive lighting calculations to be done once per visible pixel
> > instead
> > > > of once per rendered pixel.
> > >
> > >
> > > And those are again aligned on regular boundaries, of what size?  Hmmm
> it
> > > would be the screen (framebuffer) size, wouldn't it, by that point?
> > >
> > > There is actually another potential trick: take some bits of the
> PHYSICAL
> > > address to check against, as well as the virtual one.
> > >
> > > I believe I saw this trick described somewhere in the RISCV manual.
> > >
> > > With the physical pages being spread out pretty much randomly (assuming
> > > gigapages aren't used), the extra PHYS bits effectively constitute the
> > > equivalent of a hash.
> > >
> > > However gigapages could well be used here, so, hmmmm
> > >
> > > We need something that covers bits up to say 4MiB.  That's 9 extra
> bits -
> > > 12 to 21. 2^20 =1M, 2^22=4M.
> > >
> > > So a VERY simple hash of 9 bits is viable.  Bear in mind its gate count
> > > will be multiplied by 248(!) for a 4LD 4ST AGEN clash detection matrix.
> > >
> > the gate count would be multiplied by either 4 (the enqueue width; if we
> > store the hashes in the ld/st queue) or by the length of the queue (8, I
> > think). we would have one hash for each addresses to compare, not each
> > address comparator. the address comparators can be (for 8x8 with 16-bit
> > hashes) (n*(n-1))/2 16-bit comparators, which each take 16 xor gates, 4
> > 4-input nand gates, and 1 4-input nor gate, which altogether is
> > 28*(16*3+4*4+4)=1904 inverter equivalents (assuming 3/4/4 inverters for
> > xor/4-in nand/4-in nor respectively).
>
>
> Yes got it. 256 x 8 to give the hashes, then 7 6 5 4 3 2 1 compares @
> bitwidth.
>
> That would be more sane.
>
> 16 bit hashes is still a bit high.  16 AND gates, times 7654321 is i think
> 16 x 8x4 AND gates, which is what 16k ANDs so x4 for transistor count,
> 128k.
>
assuming 4-input gates can be used for and-reduction, I had calculated the
count above to be 1904 inverter equivalents for the 8x8 16-bit comparator
grid (3808 transistors)

>
> Still a bit mad numbers.  Getting that down to 32k should be still
> alatmingly high but tolerable.
>
> L.
>
>
>
> --
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>


More information about the libre-riscv-dev mailing list