[libre-riscv-dev] LD/ST address matcher

Tue Jun 4 00:59:18 BST 2019

I think the hash technique would be less expensive than your estimate:
an and gate is 6 transistors (4 for nand then 2 for inverter)
an xor gate can be 6 transistors: http://tinyurl.com/yyygzebc

On Mon, Jun 3, 2019, 16:39 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Tuesday, June 4, 2019, Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> > On Mon, Jun 3, 2019, 15:51 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > wrote:
> >
> > >  can you please confirm: will the Vulkan API produce batches of LDs in
> > > quick succession that are *exactly* on a 1MiB boundary?  if so, i'm
> > > curious to know why, and if it can be avoided.
> > >
> > yes, a series of texture reads to apply multiple different textures to a
> > surface. the accesses are all to the same location in each of the
> textures,
> > and the memory for textures is aligned to a large power of 2 (at least a
> > page, but quite likely a megapage or larger).
>
>
> Drat. Ok.
>
> So, hm, how about a direct CMP on bits 4-11, bitmap on bits 0-3 and a very
> VERY simple non-expensive hash on SOME of the upper bits? Enough to cover
> up to 4MiB or so.  We would actually be safe to do the bitmap @ bits 1-3 or
> even just 2-3
>
> The compare is NxN way so is EXPENSIVE.
>
> If there are 4 LDs and 4 STs that is an 8x8 address matching matrix, giving
> a whopping TWO HUNDRED AND FORTY EIGHT comparisons needed to be done, EVERY
> clock cycle, in parallel.
>
> 250 XOR gates is something like 4 gates or sonething per XOR, so... 256 x
> 256 x 4 = a massive 256k gates just to do AGEN hit/miss checking.
>
> Bits 4 to 11 checking would just be AND gates, so that is about.... 8x2=16
> gates per address, times 256 = 4k gates for the entire table.
>
> Bits 0 to 3 if done as a bitmap would be 8 variable width shifters (not so
> costly) then 16 bits that need checking, so around 8k gates.  Total around
> 12k.
>
> Reducing the bitmap to bits 1-3 would bring the bitmap size down to 1k
> gates, and still be able to match against 2 byte LD/ST operations,
> including misaligned 32 and 64 bit LD/STs.  Byte level LD/STs would be
> treated as single issue only if they had all but the LSB the same.
>
> Plus a bit more, however it is still over an order of magnitude less than a
> full 64 bit hash approach.
>
> Remember we do not need a full match, it is quite odd, this is a way to
> seek optimisation (parallelism) opportunities, NOT a way to fully AVOID
> LD/ST clashes, as that is the LDST Dep Matrixes job, which it has already
> done by detecting the order.
>
> Otherwise yes we would definitely need to do a full 64 bit compare which
> would be insane.
>
> L.
>
>
>
> --
> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>