[libre-riscv-dev] [isa-dev] 3D Open Graphics Alliance
luke.leighton at gmail.com
Tue Aug 13 05:45:03 BST 2019
On Monday, August 12, 2019 at 6:12:35 PM UTC+1, MitchAlsup wrote:
> On Monday, August 12, 2019 at 1:31:44 AM UTC-5, Jacob Lifshay wrote:
>> On Sun, Aug 11, 2019 at 8:17 PM lkcl <luke.l... at gmail.com> wrote:
>> > > So I agree with you that we should look at SPIR-V and the Vulkan ISA
>> > > Now that ISA is very complex and many of the instructions may
>> possibly be
>> > > reconstructed from simpler ones.
>> > yes. as the slides from SIGGRAPH2019 show, the number of opcodes
>> needed is enormous. Texturisation, mipmaps, Z-Buffers, texture buffers,
>> normalisation, dotproduct, LERP/SLERP, sizzle, alpha blending,
>> interpolation, vectors and matrices, and that's just the *basics*!
>> note: lerp is another term for linear interpolation
>> > further optimisations will need to be added over time, as well.
>> > the pressure on the OP32 space is therefore enormous, given that
>> "embedded" low-power requirements cannot be met by moving to the 48 and 64
>> bit opcode space.
>> > > We need to thus perhaps look at a "minimized" subset of the Vulkan
>> > > instructions that truly define the atomic operations from which the
>> > > ISA can be constructed. So the instruction decode hardware can
>> > > this "higher-level ISA" - perhaps in microcode - from the "atomic
>> > > at runtime while hardware support is only provided for the "atomic
>> > yes. a microcode engine is something that may be useful in other
>> implementations as well (for other purposes) so should i feel be a separate
>> I personally think that having LLVM inline the corresponding
>> implementations of the more complex operations is the better way to
>> go. For most Vulkan shaders, they are compiled at run-time, so LLVM
>> can do feature detection to determine which operations are implemented
>> in the hardware and which ones need a software implementation.
>> This reduces the pressure on the opcode space by a lot.
>> > > From the SIGGRAPH BOF it was clear there are competing interests.
>> Some people
>> > > wanted explicit texture mapping instructions while others wanted HPC
>> > > threaded vector extensions.
>> > interesting.
>> Having texture mapping instructions in HW is a really good idea for
>> traditional 3D, since Vulkan allows the texture mode to be dynamically
>> selected, trying to implement it in software would require having the
>> texture operations use dynamic dispatch with maybe static checks for
>> recently used modes to reduce the latency. See VkSampler's docs.
>> > > Although each of these can be accommodated we need to adjudicate the
>> > > in the process pipeline where they belong - atomic ISA, higher-level
>> ISA or
>> > > higher-level graphics library.
>> > OpenCL compliance is pretty straightforward to achieve. it could be
>> done by any standard supercomputer Vector Compute Engine.
>> > a good Vector ISA does **NOT** automatically make a successful GPU (cf:
>> MIAOW, Nyuzi, Larrabee).
>> > 3D Graphics is ridiculously complex and comprehensive, and therefore
>> requires careful step-by-step planning to meet the extremely demanding and
>> heavily optimised de-facto industry-standard expectations met by modern
>> GPUs, today (fixed-functions are out: shader engines are in).
>> Fixed function operations in the form of custom opcodes or separate
>> accelerators are still used for several operations that are rather
>> slow to do with standard Vector or SIMD instructions:
> When I did a 3D GPU, There was a HW unit each to perform::
> a) vertex to thread assignment
> b) rasterize primitive
> c) interpolate rasterized point
> d) Texture load
> and some higher layer HW that was used to time the various activities
> through the 100,000 clock pipeline.
i did a double-take at that one. i know you didn't sneeze whilst holding
down the "0" key because there's a comma in the middle :)
some context: we discussed this on-list back in june:
> If you include Tessellation and Geometry, both of whom can generate a
> volcano of new primitives, There
> are significant performance gains (more than 2×) to be had by doing the
> above in HW function units.
hmmm.... definitely worth it.
More information about the libre-riscv-dev