Luke Kenneth Casson Leighton
lkcl at lkcl.net
Wed Feb 19 07:19:58 GMT 2020
On Wednesday, February 19, 2020, Jacob Lifshay <programmerjake at gmail.com>
> On Tue, Feb 18, 2020, 22:49 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
> > On Wednesday, February 19, 2020, Jacob Lifshay <programmerjake at gmail.com
> > wrote:
> > > That is 8 flops/core/cycle of fp32, 16 for fp16, and, depending
> > > on how we implement it, either 2 or 4 flops/core/cycle of fp64.
> > 2 because ... no 4 if you count FMAC as 2, and we can do 2 per clock @ 64
> > bit.
> > the odd ALU will do 2FMAC FLOPS @ 64 bit, the even likewise.
> The idea was that we could have the 128-bit ALU do 2xfp64 or, since fp64 is
> much less important and takes lots of area, just 1xfp64.
Dependency Matrix "protection" is costly. 50,000 gates for one matrix.
therefore it goes on 32 bit boundaries.
to cover a 64 bit op you use *two* 32 bit entries (reservations).
128 bit is unwise to attempt.
it would need 4x 32 bit entries
so it is 64 bit ALU however because of the partitioning you can literally
throw 32 bit SIMD data into one half and throw 32 bit data into the other
and it just doesn't care in the slightest that the data is going to
different destination registers (parts thereof)
only in the 64 bit ops case do you actually care.
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
More information about the libre-riscv-dev