[libre-riscv-dev] [Bug 178] first coriolis2 tutorial, workflow and "test project" page

Wed Feb 26 11:54:23 GMT 2020

http://bugs.libre-riscv.org/show_bug.cgi?id=178

--- Comment #128 from Luke Kenneth Casson Leighton <lkcl at lkcl.net> ---
(In reply to Jean-Paul.Chaput from comment #126)
> (In reply to Luke Kenneth Casson Leighton from comment #125)
> > (In reply to Jean-Paul.Chaput from comment #124)
> 
> OK I see your vision of hierarchical P&R.

hurrah :) as long as the blocks are flattened and can be treated as a "new cell
in a cell library even though they are 20000 gates" it should work.

>   * If you break your design in too tiny blocks you will prevent
>     the placer to perform some placement optimization.

i do not intend to go mad, at a few levels below the leaf nodes let the
autorouter do its thing.

however i know for example that the pipeline stages for the IEEE754 FPU
functions such as RSQRT are very clearly and obviously "input on one side" and
"output on the other" in a chain of eight stages long and therefore should be
laid out in a forward directed graph only.

> So, in
>     my opinion you should have one or two level of hierarchy in
>     the layout placement.

that sounds sensible.

>       The top level floorplan (full chip) and maybe a supplemental
>     one in big top-level sub-blocks.
>       Staf recommend to do it completely flat (his arguments makes
>     perfect sense, but I need to see it by myself).

500,000 gates.  i have seen how much time the placer takes based on the size of
the block. it is at least O(N^2) time, and i believe it may take several weeks
or months to complete.

i am a little concerned even trying the floorplan size, we may have to place
entirely manually and then do autoroute.

>   * Another argument is that I'm not sure that Coriolis will handle
>     well (or in reasonnable time) blocks of more than 100K gates.

yes, i agree, based on the time i have seen even the difference between
3000x3000 and 5000x5000 it is concerning how much of a slowdown.

if the "granularity" of placement (the grid) can be made more coarse then maybe
it can be speeded up when the blocks are very large.

>     So your approach is important at least as "backup plan".
> 
>   * Still another point to consider is how much "block layout" reuse
>     do you have ?

because we have SIMD engines, actually quite a lot.

as a GPU we need multiple FPUs of the same type (multiple FPMUL units, multiple
LD/ST units etc)

> Sub-block layout may be interesting if you have
>     one block used multiple time that can have the same form factor
>     and pin positions.

yes, and so the idea of adding the FPU blocks *as* a Cell Library, even though
they are Monsters at maybe 40,000 gates each, starts to make sense.

> 
> Anyway there is always a lot of tradeoff in that stage of a design.
> Ideally you should make a check for all three of them, at least on
> significant part of the design.
> 
> Now, for the technical part of recursively building layout with
> Coriolis:
> 
> * The placer can only handle cells. So only the "leaf" netlists can
>   be done by him.

drat.  i was hoping to be able to treat the blocks as "cells" even though they
are 50,000 gates.

> 
> * Do not put blocks in libraries, that would confuse some Alliance
>   tools that would see them as "terminal black boxes".

interesting.  ok.

> * To assemble placed layout you must write Python scripts. They are
>   not complicated once you know what to do, but still, it implies
>   that *you* know beforehand how the blocks must be placeds.
>   This is *automated* *manual* floorplan.
>     You can even develop a Python program to furhter ease the
>   automation.

yes, i saw ringoscillator.py i liked the approach.

i think, if we do not go too deep with the manual hierarchy, it will be ok.

particularly because, in the FPU pipelines, they are very very easy

* input on NORTH (or WEST)
* output on SOUTH (or EAST)
* deliberately make them the same height (or width)
* size of each pipeline stage tells you exactly how to lay them out.
* leave a gap
* router connects output of one to input of next.

done.

> 
> There are still questions left open:

i have some too, below

> * Should we place the whole chip (whatever the method) then route
>   in one go. This may avoid the creation of channel routing.

what is "channel routing"?

the FPU pipelines, we know for a fact, they are (very large but) completely
isolated, connected only to each other, and, at the end, you get one massive
block with inputs on one side and outputs on the other, no additional inputs or
outputs from different stages.

i really should send you one of the FPU .il files so you can walk it with
yosys, or do a video.

my concern with doing single pass global routing, even that may take a huge
amount of time.

> * Or should we route each sub-block as we go up. Which may allow to
>   create guard ring, but will need the creation of routing channels,
>   and place the external terminals of the blocks.

ah i think i deduce what channel routing is, you mean if there are groups of
lines such as a data bus, you want them to stay together and only go a certain
way?

or, you mean, when creating a guard ring of VIAs you have to leave some gap so
the routing can get through to the edge of the block?

> All of the above are difficult questions, even so because the answer
> may emerge only after starting to implement.

it is fine, jean-paul, i have done PCB layout for 8 years, now, including
creating libraries of library parts.

and i am a python programmer who has done c/c++ modules. actually, a python
program that *generated* c++ modules based on IDL files.

additional questions:

1. at the leaf nodes, is it possible to tell the auto stage, "i want a fixed
height but you must keep the width as small as possible"?

2. can we specify that inputs are definitely to go on NORTH and outputs
definitely on SOUTH then the auto stage does layout which puts cells in
between, 100% guaranteed to succeed?

1 is because of the FPU pipelines, i would like each stage to be the same
height, so that when connected to each other they csn be manually placed in a
chain, all of the same height.

yes we could do an iterative approach, "does this width work FAIL does this
width work FAIL" it would be nicer not to do that!

or, to have a way to find out in advance, before routing? it should be possible
to estimate the size of the block even before Placement, right?

2. is again the same thing, output from previous block we *know* goes directly
and straight to next block.  if the data has to route all the way round, this
is silly.

the reason i ask is because in the experiments yesterday, the P&R refused to
complete, when i told it "put input on NORTH and output on SOUTH".

-- 
You are receiving this mail because:
You are on the CC list for the bug.