[libre-riscv-dev] div/mod algorithm written in python

Fri Jul 5 13:06:18 BST 2019

On Fri, Jul 5, 2019 at 12:44 PM Jacob Lifshay <programmerjake at gmail.com> wrote:
>
> On Fri, Jul 5, 2019 at 4:26 AM Luke Kenneth Casson Leighton
> <lkcl at lkcl.net> wrote:

> >  the trade-off being: "yet more classes".  if that's a balance you can
> > live with, so can i.  bear in mind that out_do_z absolutely has to be
> > used by what you [plan] call DivPipeCore, because it disables
> > processing.
> Since every DivPipeCore stage is only combinatorial, "disabling"
> processing does nothing.

 "disabling" by "not putting anything into the output", is what i
mean.  like this:

        with m.If(~self.i.out_do_z):
            # do conversion here, of both self.i.a and self.i.b,
            # into DivPipeCoreInputData dividend and divisor.

oz gets copied (always), it's not strictly necessary to do

        with m.If(~self.i.out_do_z):
            # do conversion here, of both self.i.a and self.i.b,
            # into DivPipeCoreInputData dividend and divisor.
        with m.Else():
            m.d.comb += self.oz.eq(self.i.oz) # from specialcases

> in DivPipe (non-core), since it actually
> tracks which pipeline stages actually have valid data (and supports
> cancelling, tracking muxid, etc.), that's where the disabling will
> occur.

 ... and that will need to be done as:

        with m.If(~self.i.out_do_z):

> >  remember that every submodule added slows down simulations.  i had to
> > drastically alter the design of the scoreboard code because of this.
> I can live with that. if it gets too slow, we can simulate in verilog
> -- where the inter-module boundaries shouldn't add any slow down
> (maybe even using Verilator, though I haven't gotten that to work
> yet).

 yehyeh, and cocotb would allow continuing to use the python unit
tests.  cocotb is really neat.

 in the background, i've also been mulling over how to
marshal/unmarshal data at appropriate boundaries (an AXI4 bus being
one), such that simulations can be split up to be multi-process.

 these boundaries would also be the appropriate place to have
inter-FPGA communication.

> > this is because it's a *public* (general) API, where self.ctx.op will
> > change the SIMD behaviour, be used for MUL, ADD, FCVT and much, much
> > more.
> I'm planning on SIMD-ifying DivPipe and adding support for 64-bit
> operations later, right now, I'm just making generic code that can
> handle only one operand width.

 yehyeh - my point is, *when* that happens, it goes into ctx.op and
some of those bits get used (or ctx.simd_op, or we use a  Record for
ctx.op, or... something-to-that-effect).

> > now, we _can_ look at propagating a class, all the way from the setup
> > of FPipeContext, right the way down to the individual pipeline stages
> > (make it a member of self.ctx)...
> >
> > ... but not right now.
> >
> >
> > > The DivPipe (non-core) classes will need to handle translating signed
> > > integer division into unsigned division + negations +
> > > overflow/div-by-zero handling.
>
> DivPipeCoreOperation is an internal detail of DivPipe. externally,
> ctx.op is used, which can represent all the different operations.

 if you can live with making DivPipeCoreOperation used solely as a
local signal (verilog equivalent of a "register") that would be
preferable right now.

 if not, then passing it in as a class to FPipeContext is the next
logical best option.

> DivPipeCoreOperation can't represent the difference between
> div/mod/divu/modu/fdiv, because it doesn't matter to the core
> algorithm (it's unsigned division in all the above cases). DivPipe
> translates to/from DivPipeCore. Notably, DivPipe independently
> propagates ctx.op to keep track of the exact operation.

ok, understood.  so it's used as part of the conversion (the entry /
input) and doesn't need to be propagated further.

so at some point, ctx.op could actually just.... stop being passed
down.  the way to do that would be do set (change) the pspec to set
the width to zero (or None would do it).  it'd be saving a few gates.

*however*... if on the other hand the intention is to have this same
pipeline code do FSQRT and FISQRT, then ctx.op *would* be needed, to
be passed right the way down the entire pipeline.

btw i really don't recommend doing separate classes right now.
there's a bug that may be in nmigen, involving ArrayProxy, that i
haven't had time to track down, and had to create some rather nasty
workarounds for.  i'd prefer not to have to deal with it (or explain
it) right now.  it *might* only be affecting the entry and exit points
where the ReservationStation class is using ArrayProxy.... it's...
complicated (hence the note "complicated" as a comment).

l.