[libre-riscv-dev] Vulkanizing
Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Feb 19 07:58:18 GMT 2020
On Wednesday, February 19, 2020, Jacob Lifshay <programmerjake at gmail.com>
wrote:
> On Tue, Feb 18, 2020, 23:20 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> wrote:
>
> > On Wednesday, February 19, 2020, Jacob Lifshay <programmerjake at gmail.com
> >
> > wrote:
> >
> > > On Tue, Feb 18, 2020, 22:49 Luke Kenneth Casson Leighton <
> lkcl at lkcl.net>
> > > wrote:
> > >
> > > > On Wednesday, February 19, 2020, Jacob Lifshay <
> > programmerjake at gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > That is 8 flops/core/cycle of fp32, 16 for fp16, and, depending
> > > > > on how we implement it, either 2 or 4 flops/core/cycle of fp64.
> > > >
> > > > 2 because ... no 4 if you count FMAC as 2, and we can do 2 per clock
> @
> > 64
> > > > bit.
> > > >
> > > > the odd ALU will do 2FMAC FLOPS @ 64 bit, the even likewise.
> > > >
> > >
> > > The idea was that we could have the 128-bit ALU do 2xfp64 or, since
> fp64
> > is
> > > much less important and takes lots of area, just 1xfp64.
> >
>
> it could work by having just one half support fp64, the other half could
> still run 2xf32 or other combinations.
that gets messy because the Function Units are designed to hold 32 bit data
ready for 32 bit operations.
the way 64 bit works is to have 2 neighbouring 32 bit Function Units
"temporarily synchronise" to handle a 64 bit op.
one FU holds 32 HI
the other holds 32 LO.
the partition gate is opened to make a 64 bit op, and job done.
32 HI result goes into HI regfile lane, 32 LO result goes into LO regfile
lane.
to do 64 bit on a 32 bit partition will be a frickin nuisance.
what will be needed is:
* add a fan-in system which has the top 32 bit of the input operands cross
over their HI LO regfile lanes into the target FU 32 lane
* double the number of 32 bit latches per Function Unit so as to be able to
store both HI and LO 32 data. these are effectively 2x the number of
operands into the ALU
* aporoximately double the size of the Dependency Matrices because now you
are tracking effectivrly double the number of operands
* bear in mind that this is for ALL FUs intended to be used for 64 bit
* shove the twin 32 bit stuff into the 32 bit wide ALU partition in a
microcoded multi cycle FSM fashion
* collate the two 32 bit results at the end
* fan out one of the results into the correct HI LO 32 bit regfile result
lane.
which looks simpler and is less gates as a side effect?
:)
--
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev
mailing list