[libre-riscv-dev] building a simple barrel processor

Fri Mar 8 08:17:31 GMT 2019

On Thu, Mar 7, 2019, 23:34 Luke Kenneth Casson Leighton <lkcl at lkcl.net>
wrote:

> On Fri, Mar 8, 2019 at 6:11 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > Just to clarify, I'm asking if you think it's a good idea to work on this
> > since it will take some time.
>
>  at its heart a barrel processor is a single-core single-issue
> timeslicing design, suited to real-time I/O processing.  if we were to
> add multiple barrel 4-time-sliced SMP cores, it would result in
> multiple proliferations of the massive dual/triple-ported 8k SRAMs.
>
actually, for barrel processors, because each instruction takes many cycles
to execute, you only need a single-ported sram split into banks, since each
hart has its own separate bank(s), each hart can read each instruction's
arguments one at a time, then write at the end. if there are enough harts
per core you can also run the sram at a lower clock rate & maybe lower
voltage -- dedicating more pipeline stages to reading and writing the
register file.

>
> i just don't believe that a barrel processor will yield a useful
> design for a production quality 6 GFLOPs SoC within the required power
> budget, which is the target at which, if met, i can go back to the
> client and say "we met the target, how about that $100k".
>
if we still have a 4x32-bit alu in each core, we can still achieve the
bandwidth goals relatively easily. if the power required by the extra sram
is less than the power required by the additional circuitry of the OoO
implementation, then I think we should give the barrel processor design
some more in-depth consideration. I highly doubt that 0.06 mm^2 for the
sram (8 harts/core for 4 cores with 256 64-bit registers per hart, based on
about 0.1 um^2 per bit) will cause area issues.

the barrel processor design trivially resolves spectre issues since it is
an in-order processor. we probably won't even need branch prediction as
there will probably be enough stages that we will have the outcome of
branches before we need to fetch the next instruction from the i-cache.

if we want better single-threaded performance, we could set it up so that
if all but one hart is executing wfi instructions (in idle state), we could
rearrange the pipeline to a standard single-issue in-order risc design or
something similar. I figure that switching in and out of the idle state
should take long enough that it wouldn't introduce any security problems
due to timing.

>
> also, there's someone whom i'm encouraging to introduce themselves who
> would like to help with the OoO implementation, they're getting
> settled in having just moved house.  they've got prior experience with
> OoO.
>
neat, looking forward to working with them.

>
> l.
>
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>