[libre-riscv-dev] Yehowshua - Interested in open GPU dev

Mon Jan 6 15:24:54 GMT 2020

sorry this went into spam

On Monday, January 6, 2020, Immanuel, Yehowshua U <yimmanuel3 at gatech.edu>
wrote:
>
>
> Do we have a list of milestones or task that need to be completed?

 yes, on http://bugs.libre-riscv.org

subdividing those is kinda done ondemand.

the newer projects we need to track them by creating toplevel milestones
with estimated budgets.

> > one major task is to create a full suite of other ALU operations, right
> > down to even comparators which, when the "gates" between partitions open
> > up, will do 8x8bit GTthan, 4x16bit GT, 2x32bit GT and finally 1x74 bit
> GT.
>
> This seems to be a task that I could start on.

great. it's pretty selfcontained and straightforward.

if you can track down the partitioned multiplier to see what it would look
like, bear in mind that is the absolute extreme end, here (full pipelining
etc)

i do not perceive there to be anything more complex than that, everything
else can be entirely combinatorial.

in fact it is a self contained task, simply provide __add__ etc just as
there is in Signal.

* interface is same as Signal
* width parameter is same as Signal
* construction must take *another* Signal which will be the mask partition
parameter
* that mask will be a power of 2 only (minus one)
* functions __add__ and __ge__ etc must be added which use that mask
parameter.

i would suggest starting with eg __xor__ because it is trivial

when partition mask is say 111 (all 1s) and PartitionedSignal length is 16
the result will be exactly like a Signal.

when partition mask is 0000 then because PS length is 16, it is subdivided
into 4 **separate** signals.

of course for __xor___ this makes absolutely no difference whatsoever

however for say __add__ you actually have 4 separate 4 bit adds.

i remember now.  look up PartitionedAdder, it already exists.

the way it works is, the partition mask is multiplied (ANDed) with some
extra bits in between the two inputs.

if mask is 0000 then the adds are

aaaa0aaaa0aaaa0aaaa plus
bbbb0bbbb0bbbb0bbbb

if the mask is 1111 then the adds are

aaaa0aaaa0aaaa0aaaa
bbbb1bbbb1bbbb1bbbb1

the result REMOVES these extra bits.

basically it is a neat mathematical trick, to use "carry" from a normal
long adder as the "partition" points.

so you can just use PartitionedAdder straight, without modification, i just
wanted you to be able to understand how it works.

jacob can help explain, he originally wrote it.

btw *sigh* i *have* tried to explain to whitequark that use of "type" is a
bad idea, regardless of "how much speed it gives", now *we* pay the price
for that because you cannot subclass from Signal, here.

Are there some other tasks?

take a look at the bugtracker.

one othet thing we need is an implementation in nmigen of a dadda multiplier
https://en.m.wikipedia.org/wiki/Dadda_multiplier

however we need to also be able to partition it.

> I might be able to squeeze something in this month.

great

> _______________________________________________
> libre-riscv-dev mailing list
> libre-riscv-dev at lists.libre-riscv.org
> http://lists.libre-riscv.org/mailman/listinfo/libre-riscv-dev
>

-- 
---
crowd-funded eco-conscious hardware: https://www.crowdsupply.com/eoma68