[libre-riscv-dev] Request for input and technical expertise for Systèmes Libres Amazon Alexa IOT Pitch 10-JUN-2020

Mon Jun 8 16:27:46 BST 2020

(btw thank you staf for the insights)

On Mon, Jun 8, 2020 at 2:56 PM Hendrik Boom <hendrik at topoi.pooq.com> wrote:
>
> On Mon, Jun 08, 2020 at 12:13:48PM +0100, Luke Kenneth Casson Leighton wrote:
> >
> > all of these things are at the architectural level.  we are not doing
> > anything fancy at the gate level.  it is a matter of making
> > *architectural* decisions that reduce power consumption, operating
> > within the exact same gate-level power consumption constraints as
> > every other ASIC out there.
>
> You point out that the CPU and GPU share cache, being the same processor.

yes.  or: the CPU instructions and GPU instructions, by being in the
same ISA, the GPU *workload* will push CPU workload(s) out of the
(same) L1 Cache.

>
> But we are designing a four-core chip?

yes.  therefore there will be 1x L1 Data and 1x L1 Instruction Cache
per each of those four cores.

>  To what extent to the four cores share cache?

L1?  not at all - ever.

> And on avoiding data copying between CPU ad GPU:
>
> I believe the OpenGL API involves copying data from CPU buffers to GPU
> buffers, with the understanding that the CPU copies can be discarded
> while the GPU goes on with its copy.

... because the assumption is that the GPU is a completely and utterly
separate processor, the "command" to perform that copy is expected to
involve the excruciatingly-painful process previously mentioned, from
which i excluded the userspace-kernelspace context-switching so as not
to have people run away screaming in terror.

in our case: it would simply be... a memcpy.

> Having the same storage for both sets of buffers could obviously obviate
> these copies, except that software that uses this API will likely rely
> on being able to overwrite the CPU-side buffers with impunity.  So the
> copy will still have to be done.

sounds reasonable to me.  actually now that i think about it, if the
buffer is placed into a shmem segment with copy-on-write semantics,
the memcpy will not be needed, and the "overwriting", because of the
CoW semantics, would only be done on-demand.

this however would be an optimisation.

the *only reason* that we can even remotely consider such an
optimisation is precisely because of the hybrid architecture.

> Do I misunderstand OpenGL?  Is Vulcan different?

don't know.

> Will users want to bypass these libraries and use the graphics instructios directly?

only if they want to become an assembly-level expert, with all the
inherent implications and performance-complexity tradeoffs that always
come with doing assembly programming.

> Or is there some other sublety I'm missing?

no idea :)

l.