[libre-riscv-dev] [isa-dev] 3D Open Graphics Alliance

Tue Aug 13 05:45:03 BST 2019

On Monday, August 12, 2019 at 6:12:35 PM UTC+1, MitchAlsup wrote:
>
>
>
> On Monday, August 12, 2019 at 1:31:44 AM UTC-5, Jacob Lifshay wrote:
>>
>> On Sun, Aug 11, 2019 at 8:17 PM lkcl <luke.l... at gmail.com> wrote: 
>> > > So I agree with you that we should look at SPIR-V and the Vulkan ISA 
>> seriously. 
>> > > Now that ISA is very complex and many of the instructions may 
>> possibly be 
>> > > reconstructed from simpler ones. 
>> > 
>> > yes.  as the slides from SIGGRAPH2019 show, the number of opcodes 
>> needed is enormous.  Texturisation, mipmaps, Z-Buffers, texture buffers, 
>> normalisation, dotproduct, LERP/SLERP, sizzle, alpha blending, 
>> interpolation, vectors and matrices, and that's just the *basics*! 
>>
>> note: lerp is another term for linear interpolation 
>>
>> > 
>> > further optimisations will need to be added over time, as well. 
>> > 
>> > the pressure on the OP32 space is therefore enormous, given that 
>> "embedded" low-power requirements cannot be met by moving to the 48 and 64 
>> bit opcode space. 
>> > 
>> > > We need to thus perhaps look at a "minimized" subset of the Vulkan 
>> ISA 
>> > > instructions that truly define the atomic operations from which the 
>> full 
>> > > ISA can be constructed. So the instruction decode hardware can 
>> implement 
>> > > this "higher-level ISA" - perhaps in microcode - from the "atomic 
>> ISA" 
>> > > at runtime while hardware support is only provided for the "atomic 
>> ISA". 
>> > 
>> > yes.  a microcode engine is something that may be useful in other 
>> implementations as well (for other purposes) so should i feel be a separate 
>> proposal. 
>>
>> I personally think that having LLVM inline the corresponding 
>> implementations of the more complex operations is the better way to 
>> go. For most Vulkan shaders, they are compiled at run-time, so LLVM 
>> can do feature detection to determine which operations are implemented 
>> in the hardware and which ones need a software implementation. 
>>
>> This reduces the pressure on the opcode space by a lot. 
>>
>> > > From the SIGGRAPH BOF it was clear there are competing interests. 
>> Some people 
>> > > wanted explicit texture mapping instructions while others wanted HPC 
>> type 
>> > >  threaded vector extensions. 
>> > 
>> > interesting. 
>>
>> Having texture mapping instructions in HW is a really good idea for 
>> traditional 3D, since Vulkan allows the texture mode to be dynamically 
>> selected, trying to implement it in software would require having the 
>> texture operations use dynamic dispatch with maybe static checks for 
>> recently used modes to reduce the latency. See VkSampler's docs. 
>>
>> > > Although each of these can be accommodated we need to adjudicate the 
>> location 
>> > > in the process pipeline where they belong - atomic ISA, higher-level 
>> ISA or 
>> > > higher-level graphics library. 
>> > 
>> > OpenCL compliance is pretty straightforward to achieve.  it could be 
>> done by any standard supercomputer Vector Compute Engine. 
>> > 
>> > a good Vector ISA does **NOT** automatically make a successful GPU (cf: 
>> MIAOW, Nyuzi, Larrabee). 
>> > 
>> > 3D Graphics is ridiculously complex and comprehensive, and therefore 
>> requires careful step-by-step planning to meet the extremely demanding and 
>> heavily optimised de-facto industry-standard expectations met by modern 
>> GPUs, today (fixed-functions are out: shader engines are in). 
>>
>> Fixed function operations in the form of custom opcodes or separate 
>> accelerators are still used for several operations that are rather 
>> slow to do with standard Vector or SIMD instructions:
>
>   
> When I did a 3D GPU, There was a HW unit each to perform::
> a) vertex to thread assignment
> b) rasterize primitive
> c) interpolate rasterized point
> d) Texture load
> and some higher layer HW that was used to time the various activities 
> through the 100,000 clock pipeline.
>

i did a double-take at that one.  i know you didn't sneeze whilst holding 
down the "0" key because there's a comma in the middle :)

some context: we discussed this on-list back in june:
http://bugs.libre-riscv.org/show_bug.cgi?id=91#c1

> If you include Tessellation and Geometry, both of whom can generate a 
> volcano of new primitives, There
> are significant performance gains (more than 2×) to be had by doing the 
> above in HW function units.
>

hmmm.... definitely worth it.