[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Fri Aug 9 09:46:16 BST 2019

On Friday, August 9, 2019 at 4:12:14 PM UTC+8, Bruce Hoult wrote:
> On Fri, Aug 9, 2019 at 12:40 AM lkcl <luke.leighton at gmail.com> wrote:
> >
> > On Friday, August 9, 2019 at 8:25:54 AM UTC+1, andrew wrote:
> >
> >>>
> >>> Andrew: I appreciate that you're busy
> >>
> >> Good point - I capitulate.
> >
> > Andrew!  that doesn't help!  your input here is just as valuable as everyone else's.  if you believe you have a better idea *it's important to evaluate it*!
> 
> It's not Andrew's job to come up with a better idea. It's your job.
> 
> The key word is "quantitative". You, as the proposer of new
> instructions must provide justification for the percentage improvement
> of having the new instruction vs not having it. Not hand-waving.
> Numbers. Measured, or at least calculated in a justifiable way.

Okaay, thank you for explaining.

> If there is a measurable and significant improvement on some large
> body of code, such as SPEC for example, then that would be grounds for
> considering inclusion in a RISC-V Foundation standard extension.

It took Jeff Bush on Nyuzi about... 2 years to get to the point of being able to do that level of assessment.

We're a small team, only 2 full time engineers, total sponsorship: EUR 50,000.

The RISCV Foundation receives.. what... several million investment and membership fees?

So as a Libre Project, and as GPUs is a market that we know there is demand for... you get the general idea.

In the meantime I will reach out to my contact and explain the situation to him. He will then be in a better position to explain to new Alliance Members what'w needed.

> If it improves only some narrow specialized task then that might
> justify a custom extension.

No, depending on how much silicon area is dedicated to FP, it'll be somewhere around one or two orders of magnitude performance improvement in both OpenCL and 3D.

For 3D embedded, the collaboration itself (that the opcodes exist at all and have software support) is the key benefit. Performance metrics in the 3D embedded space are actually completely and utterly misleading.

> But you haven't even shown that, other
> than "hardware good -- software bad". Is it even measurable, even on
> *your* workload?

Remember, there's several different platforms. They all have different requirements. Driving the entire proposal from a perormance or "quantitive" perspective is both detrimental, misleading, and misses the point.

> We sure don't know the answer.

It'll be about... estimated... six to eight months before we have RTL that can run anything.

In the meantime, *for those platforms that desire performance*, simple assessments of what libm currently does in s/w, and replacing that with *single cycle* hardware opcodes gives a clear idea of the performance gains.

Also: 3D requires *guaranteed* real time response times.  Iterative blocking algorithms are absolutely unacceptable as they break the guaranteed pixel frame rate requirements.

For example, we have to do FPDIV as a pipeline, iterative N.R. is a No, and there is one FPDIV per pixel in Normalisation (and one RSQRT).

Jeff Bush's paper, nyuzi_pass2016, is also a good reference. It shows what happens in 3D if you *don't* have the right primitives.

Nyuzi, like Larrabee, is fantastic as a Vector Compute Engine. As a 3D GPU, it is a paltry 25% of modern GPU performance for the same silicon area / power budget.

The Larrabee team were not permitted to reveal that little fact in their original paper. whoops :)

> Potential market size is irrelevant The most it does is provide
> justification for doing the quantitative performance evaluation in the
> first place.

Funding. We're doing this project from charitable donations, from NLNet. Find the funding, we can do the evaluation.

Otherwise, someone else has to do it. We may be able to find someone, through our contacts, but there's definitely no budget available for our small team to do six to eight months of research, here.

Sorry.

Ok, so thank you for clarifying, Bruce: I'll ask around, see how this can be solved.

L.