[libre-riscv-dev] next tasks

Wed Feb 19 20:23:54 GMT 2020

On Wed, Feb 19, 2020 at 7:49 PM Michael Nolan <mtnolan2640 at gmail.com> wrote:

> Is this in soc.git/src/experiment
> (https://git.libre-riscv.org/?p=soc.git;a=tree;f=src/experiment;h=915526d5d3e4a717db2c015322849d14a167dcf7;hb=HEAD)?

yes!  python3 score6600.py should actually work.  you can see i have
stuffed a series of pseudo/micro-instructions in by way of a tuple.

you'll note that that tuple is a *fixed length* (and thus very long -
far longer than the original opcode).

> >> Since we're switching between POWER and RISCV at runtime and making an
> >> OOO processor, I'm suspecting it's going to be more like the former
> >
> > bear in mind it is as if we have a 33 bit instruction.
> >
> > the top bit simply selects as a mux the different ISAs.
>
> Right my point with that was that there would be a bit of translation
> going on because the POWER or RISCV instruction set doesn't *exactly*
> line up with our backend.

yes exactly.  basically the back-end will have to have an amalgamation
of the features of both ISAs, plus a bit of "extra micro-opcodes /
translators".

one good example is that FP ops in POWER, the data for FP32 is stored
*in* the bit-positions of FP64 (with relevant LSBs and MSBs of both
mantissa and exponent being zero).  where RV stores the FP32 *in* a 64
bit register with the top 32 MSBs being.... all 1s?  i think?
something like that.

therefore we would need a special "flag" to tell the hardware engine
to put in an extra stage to perform the conversion if either POWER or
RV.

> > however in the "intermediary" expanded internal opcodes we have a bit set
> > which goes "if this is set then after result computation, shove me through
> > the signed pipeline and clear the bit saying i been there".
> >
> > make any sense?
>
> So this would be more like a bundle if you will of multiple operations
> grouped together? So for instance the POWER instruction `lwzu r1, D(r2)`
> might be decomposed into:
>
> <add r2, r2, D | load.w r1, [r2/partial result]>

yeah, that's a good example.

> It seems to me that it might be more efficient area/power wise to split
> it into two separate instructions, reducing the width of the instruction
> path and removing the feedback mechanism, but increasing pressure on the
> register file and OOO machinery. Apologies if this doesn't make sense
> for the 6600 style of machine, it's difficult to wrap my head around still.

read those book chapters first, and, also, look up videos and diagrams
online about the Tomasulo Algorithm.  that's better-explained because
the academic literally completely and spectacularly fails to
comprehend the 6600.  it will at least introduce you to OoO
processing.

it's actually really simple, and, crucially, "hand over" some of the
nastiest aspects of computer science (if you try an in-order design)
to one really clean "block".

l.