[Libre-soc-bugs] [Bug 230] Video opcode development and discussion
    bugzilla-daemon at libre-soc.org 
    bugzilla-daemon at libre-soc.org
       
    Fri Dec 11 17:10:38 GMT 2020
    
    
  
https://bugs.libre-soc.org/show_bug.cgi?id=230
--- Comment #15 from Jacob Lifshay <programmerjake at gmail.com> ---
If your trying to do a giant sum-reduction and don't care that much about the
exact order of add ops, the code I've seen that should be most efficient is:
const size_t HW_PAR = 32; // the number of add ops per inner loop that the hw
needs to keep the pipeline full
using vec = sv_vec<float, HW_PAR>;
float reduce_add(float *in, size_t in_size) {
    // optionally do a single shorter horizontal_add for in_size < HW_PAR
    vec accumulator = splat(0.0f, VL=HW_PAR);
    while(in_size != 0) {
        size_t vl = min(HW_PAR, in_size);
        vec v = load(in, VL=vl);
        // elements with index >= vl are unmodified
        accumulator = add(accumulator, v, VL=vl);
        in += vl;
        in_size -= vl;
    }
    return horizontal_add(accumulator, VL=HW_PAR);
}
-- 
You are receiving this mail because:
You are on the CC list for the bug.
    
    
More information about the libre-soc-bugs
mailing list