[libre-riscv-dev] Libre RISC-V Requirements Specification document

Thu Jan 10 12:39:30 GMT 2019

On Thu, Jan 10, 2019, 03:52 Luke Kenneth Casson Leighton <lkcl at lkcl.net
wrote:

> ---
> crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68
>
> On Thu, Jan 10, 2019 at 11:42 AM Jacob Lifshay <programmerjake at gmail.com>
> wrote:
>
> > You will probably need another read port to read the masked-out elements
> > from rd since the predicate may change often.
>
>  hm, ok.  well, we'll find out pretty quickly.
>
> > >
> > > As a result we should be able to get away with 4 32 bit banks of 2R1W,
> even
> > > for repeated FMAC, the proviso being there that the src accumulator
> must be
> > > the dest of the previous FMAC. This case is the one I worked out how to
> > > detect.
> > >
> > > Btw Daniel, Jacob, the scoreboard OoO system absolutely does not care
> what
> > > the pipeline length is, or even if there isn't one. FSQRT and FDIV can
> > > therefore be done as blocking units, without detrimental consequences.
> > >
> > We have to ensure that all the blocking units can be cleared without
> their
> > previous state affecting the timing otherwise you can use them to leak
> data
> > from mis-speculation.
>
>  ok so now i get it.  ok, so that's pretty straightforward.
>
> > For the divider, for both radix-4 and newton's algorithm we can share
> most
> > of the logic between divide, sqrt, and inv-sqrt, so I think we should
> build
> > a unified unit.
> > Division is going to need to be done at least once per pixel, so I think
> we
> > will need a pipelined divider or at least several non-pipelined dividers.
> > We can share the cost of a pipelined divider between 2 cores by having
> them
> > issue divides on alternate cycles.
> >
> > One of the sqrt algorithms I am thinking of is:
> >
> https://en.wikipedia.org/wiki/Methods_of_computing_square_roots#Binary_numeral_system_(base_2)
>
>  hey, that's the method i was talking about that just basically does a
> compare and an add!  it's based on a^2 + b^2 => a^2 + a(2b + b) and
> you just move one bit at a time from a to b.
>
>  obviously you can do 2 bits at a time, however you need 3 comparators
> (one for 1x, one for 2x, one for 3x).  and you can do 4 bits at a
> time, however there you need 7 comparators.
>
> > We would run 2 iterations of the lower loop per pipeline stage since that
> > matches what you need for radix-4 division. The pipeline would be approx
> 16
> > stages long.
>
>  only 8 stages if you have 3 parallel comparators and do 2 bits at a time.
>
no, 16 since there's 2 bits/stage and you need 32 bits

>
>  l.
>
> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>