[libre-riscv-dev] [Bug 208] implement CORDIC in a general way sufficient to do transcendentals

Tue May 5 18:02:11 BST 2020

https://bugs.libre-soc.org/show_bug.cgi?id=208

--- Comment #53 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Michael Nolan from comment #52)
> (In reply to Luke Kenneth Casson Leighton from comment #50)
> > btw 12 stages is too many, because we need a minimum of one FunctionUnit
> > "managing" (matching up) source operands and results.
> > 
> > each extra Function Unit added creates an Order N^2 increase in the overall
> > size of the Dependency Matrices.
> > 
> > so - and i don't mean right now - if this can be either:
> > 
> > * cut into two separate pipelines (partial data fed back as a micro-op)
> > * number of combinatorial blocks increased to 5 or 6
> > 
> > we stand a chance of keeping the FU count down (8 is ok).
> > 
> > have to think that through carefully, how to do micro-ops.
> 
> Sure. I don't know about splitting it up into multiple uops,

neither do i at the moment!  i think i might have got confused about
sharing the MUL unit with FPMUL.

>  but reducing
> the number of stages is pretty easy. I added a parameter to the module
> called rounds_per_stage which governs how many cordic rounds go in each
> pipeline stage. Increasing that number will decrease the number of pipeline
> stages.

fantastic, that's exactly the kind of parameterisation _we_ need,
however as a general-purpose IEEE754 FP library, it's exactly
the kind of parameterisation that other _users_ need :)

> 
> (In reply to Luke Kenneth Casson Leighton from comment #51)
> > the other option:
> > 
> > keep the long pipeline lengths: we *know* that there's more stages than
> > there are FUs, and that's just "tough luck".  we know that with only
> > 8 FUs and 12 pipeline stages, 4 of those at any one time will run
> > empty, and we just... live with that.
> 
> I was going to ask if we could do something like this. Since I don't think
> the cordic will be used *that* often, this seems like a reasonable thing to
> do.

yehyeh.

ok what else can we throw in here?  LOG1P is probably a good thing to try

this is a great explanation:
http://www.mclenegan.com/public/thesis.pdf

section 4.4.1 explains "rotation mode"

however we also need *vector* mode (4.4.2)

i know - somewhere - i've seen a python implementation of vector mode.
it also specified, as you can see, the three other modes: circular,
linear, hyperbolic.

ah!  i know why we want to keep the two answers: see p39 of that
thesis: you can use them (with some post-processing) to do
tan.

also, see circular-vectoring mode, it ends up computing the
normal distance and... and the angle?  ohh, that's for... darn
what was it... it's for arcs.  there's a name for this... sorry
i forget what it's called right now.

https://www.ijert.org/vhdl-implementation-of-cordic-algorithm-for-wireless-lan

this version, p14, explains much clearer that "rotate" mode goes
by angle and computes x,y

"vector" mode the vector is rotated to be flat against the x-axis,
recording the angle (Z) needed to do so.

this one is a survey of CORDIC algorithms:
http://www.ee.ic.ac.uk/pcheung/teaching/ee3_DSD/crdcsrvy.pdf

it shows - *briefly* - how to use them to do different things, but
it also, by being a survey, has some really useful insights into
various optimisations.

so can i suggest, first adding the infrastructure back in to allow
cancellation and operations (ctx), then see if you can add one extra
"mode" (the m=-1, m=0, m=1 thing seems trivial which will give the
circular, linear and hyperbolic modes, respectively).

"circular" mode is covered with the output being SIN/COS.

ah ha!  here's a useful suite of implementations (annoyingly being
in jupiter)
https://github.com/suyashmahar/cordic-algorithm-python/blob/master/cordic_implementation.ipynb

extractign the two modes:

circular = 1
linear = 0
hyperbolic = -1

def ROM_lookup(iteration, coordinate):
    if (coordinate == circular):
        return math.degrees(math.atan(2**(-1*iteration)))
    elif (coordinate == linear):
        return 2**(-1*iteration)
    elif (coordinate == hyperbolic):
        return (math.atanh(2**(-1*iteration)))

def rotation_mode(x, y, z, coordinate, iterations):
    a = 0.607252935;   # = 1/K

    x_val_list = []
    y_val_list = []
    z_val_list = []
    iterations_list = []

    i = 0;                  # Keeps count on number of iterations

    current_x = x         # Value of X on ith iteration 
    current_y = y         # Value of Y on ith iteration
    current_z = z         # Value of Z on ith iteration

    di = 0

    if (coordinate == hyperbolic):
        i = 1
    else:
        i = 0

    flag = 0

    if (iterations > 0):
        while (i < iterations):
            if (current_z < 0):
                di = -1
            else:
                di = +1
            next_z = current_z - di * ROM_lookup(i, coordinate)
            next_x = current_x - coordinate * di * current_y * (2**(-1*i))
            next_y = current_y + di * current_x * 2**(-1*i)

            current_x = next_x
            current_y = next_y
            current_z = next_z

            x_val_list.append(current_x)
            y_val_list.append(current_y)
            z_val_list.append(current_z)

            iterations_list.append(i)

            if (coordinate == hyperbolic):
                if ((i != 4) & (i != 13) & (i!=40)):
                    i = i+1
                elif (flag == 0):
                    flag = 1
                elif (flag == 1):
                    flag = 0
                    i = i+1
            else:
                i = i+1
    return { 'x':x_val_list, 'y':y_val_list, 'z':z_val_list,
'iteration':iterations_list, }

ahhh, i recognise this code :)  i've seen it before.  
def vector_mode(x, y, z, coordinate, iterations):
    a = 1.2075;   # = 1/K

    x_val_list = []
    y_val_list = []
    z_val_list = []
    iterations_list = []

    i = 0;                  # Keeps count on number of iterations

    current_x = x         # Value of X on ith iteration 
    current_y = y         # Value of Y on ith iteration
    current_z = z         # Value of Z on ith iteration

    di = 0

    # This is neccesary since result for i=0 doesn't exists for hyperbolic 
    # co-ordinate system.
    if (coordinate == hyperbolic):
        i = 1
    else:
        i = 0

    flag = 0

    if (iterations > 0):
        while (i < iterations):
            di = -1*math.copysign(1, current_y);#*current_x);
            next_x = current_x - coordinate * di * current_y * (2**(-1*i))
            next_y = current_y + di * current_x * 2**(-1*i)
            next_z = current_z - di * ROM_lookup(i, coordinate)

            current_x = next_x
            current_y = next_y
            current_z = next_z

            x_val_list.append(current_x)
            y_val_list.append(current_y)
            z_val_list.append(current_z)

            iterations_list.append(i)

            if (coordinate == hyperbolic):
                if ((i != 4) & (i != 13) & (i!=40)):
                    i = i+1
                elif (flag == 0):
                    flag = 1
                elif (flag == 1):
                    flag = 0
                    i = i+1
            else:
                i = i+1
    return { 'x':x_val_list, 'y':y_val_list, 'z':z_val_list,
'iteration':iterations_list }

if you can drop that into a repository, get it working, write a
few simple experiments and see how it operates in each of the 6
modes, we're on a roll :)

-- 
You are receiving this mail because:
You are on the CC list for the bug.