[libre-riscv-dev] LD/ST address matcher
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Tue Jun 4 01:21:35 BST 2019
On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com> wrote:
> On Mon, Jun 3, 2019, 16:58 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com>
> wrote:
> >
> > >
> > > another case we can't get around
> >
> >
> > The double negative is doing my head in :)
> >
> >
> > > by allowing multiple simultaneous loads is
> > > drawing to multiple output textures simultaneously, a technique that
> > allows
> > > expensive lighting calculations to be done once per visible pixel
> instead
> > > of once per rendered pixel.
> >
> >
> > And those are again aligned on regular boundaries, of what size? Hmmm it
> > would be the screen (framebuffer) size, wouldn't it, by that point?
> >
> > There is actually another potential trick: take some bits of the PHYSICAL
> > address to check against, as well as the virtual one.
> >
> > I believe I saw this trick described somewhere in the RISCV manual.
> >
> > With the physical pages being spread out pretty much randomly (assuming
> > gigapages aren't used), the extra PHYS bits effectively constitute the
> > equivalent of a hash.
> >
> > However gigapages could well be used here, so, hmmmm
> >
> > We need something that covers bits up to say 4MiB. That's 9 extra bits -
> > 12 to 21. 2^20 =1M, 2^22=4M.
> >
> > So a VERY simple hash of 9 bits is viable. Bear in mind its gate count
> > will be multiplied by 248(!) for a 4LD 4ST AGEN clash detection matrix.
> >
> the gate count would be multiplied by either 4 (the enqueue width; if we
> store the hashes in the ld/st queue) or by the length of the queue (8, I
> think). we would have one hash for each addresses to compare, not each
> address comparator. the address comparators can be (for 8x8 with 16-bit
> hashes) (n*(n-1))/2 16-bit comparators, which each take 16 xor gates, 4
> 4-input nand gates, and 1 4-input nor gate, which altogether is
> 28*(16*3+4*4+4)=1904 inverter equivalents (assuming 3/4/4 inverters for
> xor/4-in nand/4-in nor respectively).
Yes got it. 256 x 8 to give the hashes, then 7 6 5 4 3 2 1 compares @
bitwidth.
That would be more sane.
16 bit hashes is still a bit high. 16 AND gates, times 7654321 is i think
16 x 8x4 AND gates, which is what 16k ANDs so x4 for transistor count, 128k.
Still a bit mad numbers. Getting that down to 32k should be still
alatmingly high but tolerable.
L.
--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev
mailing list