[Libre-soc-bugs] [Bug 230] Video opcode development and discussion

bugzilla-daemon at libre-soc.org bugzilla-daemon at libre-soc.org
Fri Dec 11 17:10:38 GMT 2020


https://bugs.libre-soc.org/show_bug.cgi?id=230

--- Comment #15 from Jacob Lifshay <programmerjake at gmail.com> ---
If your trying to do a giant sum-reduction and don't care that much about the
exact order of add ops, the code I've seen that should be most efficient is:

const size_t HW_PAR = 32; // the number of add ops per inner loop that the hw
needs to keep the pipeline full

using vec = sv_vec<float, HW_PAR>;

float reduce_add(float *in, size_t in_size) {
    // optionally do a single shorter horizontal_add for in_size < HW_PAR
    vec accumulator = splat(0.0f, VL=HW_PAR);
    while(in_size != 0) {
        size_t vl = min(HW_PAR, in_size);
        vec v = load(in, VL=vl);
        // elements with index >= vl are unmodified
        accumulator = add(accumulator, v, VL=vl);
        in += vl;
        in_size -= vl;
    }
    return horizontal_add(accumulator, VL=HW_PAR);
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.


More information about the libre-soc-bugs mailing list