[Libre-soc-dev] Audio and Video Codec Algorithmic analysis for instruction creation was from Re: clamping/saturation semantics

Lauri Kasanen cand at gmx.com
Sat Dec 12 08:38:58 GMT 2020


On Sat, 12 Dec 2020 00:12:41 -0800
Cole Poirier <colepoirier at gmail.com> wrote:

> Sorry do you mean there too much ambiguity in software profiling? As in
> it’s too variable such that deriving instructions from spectate algorithm
> profiling has a a high probability of creating suboptimal instructions?
> Otherwise I’m sorry I don’t think I understand, can you explain further?
>
> What is the method we, or you are using to identify instructions that are
> potentially good candidates for hardware acceleration that will then need
> to be implemented, simulated and compared against other implementations to
> determine the area/power/performance improvement per Jeff Bush’s Nyuzi
> Raster method?

I mean: sw profiling is such a basic technique it's not useful to
google papers. Likewise for how much power an instr takes.

The "accelerating the C version of an algo, instr for instr, leads to a
worse outcome than a proper transformation" is completely unrelated.
Again not useful to google papers for, that just requires understanding
of the algo and of the target.

If you need an example for your own understanding, compare a naive C
strcmp and the C strcmp you find in an optimized C library. One is a
simple byte loop, the other a completely different paradigm. No matter
how much faster you make the simple loop, it's still bound by the
number of operations it does and how they depend on each other.

Where googling papers would be useful is the specific hw design of a
specific hw operation. The time for that is once we have hw operations
determined. E.g. "what is the best way to implement sqrt in hw".

- Lauri



More information about the Libre-soc-dev mailing list