[libre-riscv-dev] Vulkanizing

Jacob Lifshay programmerjake at gmail.com
Wed Feb 19 07:38:25 GMT 2020


On Tue, Feb 18, 2020, 23:20 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Wednesday, February 19, 2020, Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > On Tue, Feb 18, 2020, 22:49 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > wrote:
> >
> > > On Wednesday, February 19, 2020, Jacob Lifshay <
> programmerjake at gmail.com
> > >
> > > wrote:
> > >
> > > > That is 8 flops/core/cycle of fp32, 16 for fp16, and, depending
> > > > on how we implement it, either 2 or 4 flops/core/cycle of fp64.
> > >
> > > 2 because ... no 4 if you count FMAC as 2, and we can do 2 per clock @
> 64
> > > bit.
> > >
> > > the odd ALU will do 2FMAC FLOPS @ 64 bit, the even likewise.
> > >
> >
> > The idea was that we could have the 128-bit ALU do 2xfp64 or, since fp64
> is
> > much less important and takes lots of area, just 1xfp64.
>

it could work by having just one half support fp64, the other half could
still run 2xf32 or other combinations.

>
>
>  Dependency Matrix "protection" is costly.  50,000 gates for one matrix.
>
> therefore it goes on 32 bit boundaries.
>
> to cover a 64 bit op you use *two* 32 bit entries (reservations).
>
> 128 bit is unwise to attempt.
>
> it would need 4x 32 bit entries
>
> so it is 64 bit ALU however because of the partitioning you can literally
> throw 32 bit SIMD data into one half and throw 32 bit data into the other
> and it just doesn't care in the slightest that the data is going to
> different destination registers (parts thereof)
>
> only in the 64 bit ops case do you actually care.


Jacob


More information about the libre-riscv-dev mailing list