[Libre-soc-dev] GPR-to-FPR and FPR-to-GPR move operations
    Luke Kenneth Casson Leighton 
    lkcl at lkcl.net
       
    Sat May 29 10:04:58 BST 2021
    
    
  
links:
https://bugs.libre-soc.org/show_bug.cgi?id=230#c71
https://libre-soc.org/openpower/sv/int_fp_mv/
Lauri is kindly investigating MP3 in SVP64 assembler and it's turning out to
be a good test of what opcodes are needed.  in the bi-weekly meeting last
week, Paul, we mentioned briefly the need for GPR-to-FPR and FPR-to-GPR
mv operations (straight bit-wise) given that VSX/SIMD will not be added to
Libre-SOC as a GPU / VPU.
Jeff Bush's Nyuzi paper makes it clear that the cost of transferring
GPU-style
workloads through L1/L2 cache is hugely expensive, and describes the efforts
he went to to reduce power consumption
https://www.researchgate.net/publication/282269512_Nyami_A_synthesizable_GPU_architectural_model_for_general-purpose_and_graphics-specific_workloads
additionally, Lauri points out that just to get zero into an FPR is also
costly: it requires a LD operation which takes up data segment space
and unnecessarily activates both memory as well as L2 and L1 data
cache paths when compared to a MV-from-GPR operation.
in addition to that, in an Out-of-Order system the cycle latency of the
path through L1 cache will be much higher than a straight MV operation
(which in some micro-architectures may be a macro-op-fused operation).
* this in turn requires a larger number of "in-flight" operations
* this in turn increases the number of Reservation Stations
* this in turn increases O(N^2) the size of Dependency Matrices
the impact therefore of using the LD-ST path is extremely costly: all
of which points to a straight bit-copy between GPR and FPR being
necessary.
in some micro-architectures the MV may end up being a macro-op
fused operation: it may end up actually being removed entirely from
the pipelines, instead being used to mark the source or destination
of INT or FP operations as targetting the *other* regfile:
     fmv2int  fp5, r3
     addi r3, 0x5
becomes (macro-fused):
     addi fp5, 0x5
it should be clear that when adding bitmanip operations as well, the
possibilities expand to be able to perform bitmanipulation on FPRs.
l.
    
    
More information about the Libre-soc-dev
mailing list