[libre-riscv-dev] [isa-dev] Re: FP transcendentals (trigonometry, root/exp/log) proposal

Tue Aug 13 02:05:23 BST 2019


On Monday, August 12, 2019 at 7:11:21 PM UTC-5, lkcl wrote:
>
> On Tuesday, August 13, 2019 at 1:52:16 AM UTC+8, MitchAlsup wrote: 
> > On Sunday, August 11, 2019 at 10:20:28 PM UTC-5, lkcl wrote: 
> > https://libre-riscv.org/ztrans_proposal/#khronos_equiv 
> > 
> > 
> > I would like to point out that the general implementations of ATAN2 do a 
> bunch of special case checks and then simply call ATAN. 
>
> Appreciated.  I recorded these insights on the page (to move offpage, to 
> discussion, at a later point). 
>   
> > The bottom line is that I think you are choosing to make too many of 
> these into OpCodes, making the hardware 
> > function/calculation unit (and sequencer) more complicated that 
> necessary. 
>
> We do have to be careful to ensure that multiple disparate Platform 
> implementors are happy, and that tends to suggest that the extension 
> remains close to a RISCV ISA paradigm. 
>
> > 
> ---------------------------------------------------------------------------------------------------------------------------------------------------- 
>
> > I might suggest that if there were a way for a calculation to be 
> performed and the result of that calculation 
> > chained to a subsequent calculation such that the precision of the 
> result-becomes-operand is wider than 
> > what will fit in a register, then you can dramatically reduce the count 
> of instructions in this category while retaining 
> > acceptable accuracy: 
> > 
> > 
> >      z = x / y 
> > can be calculated as:: 
> >      z = x × (1/y) 
> > 
> > 
> > Where 1/y has about 26-to-32 bits of fraction. No, it's not IEEE 
> 754-2008 accurate, but GPUs want speed and 
> > 1/y is fully pipelined (F32) while x/y cannot be (at reasonable area). 
>
> Sigh yehhh this is... ok let me put it this way. If we were doing a from 
> scratch dedicated GPU ISA (along the lines of proprietary GPUs, with 
> associated software  RPC / IPC Marshalling system between the completely 
> disparate ISAs) I would in absolutely no way start from a RISC-V base. 
>
> That's not because the RISCV Foundation is a pain to deal with, it's 
> *technical* reasons, namely that it is a retrofit into an ISA that was 
> designed for a completely different market than 3D. 
>
> > Given that one has the ability to carry (and process) more fraction 
> bits, one can then do high precision 
> > multiplies of  π or other transcendental radixes. 
> > 
> > 
> > And GPUs have been doing this almost since the dawn of 3D. 
>
> Appreciated.  Background, first.  Can skip if short of time 
>
> --- 
>
> Basically what you are recommending is a microcode ISA. 


The alternative is to designate a few OpCodes in a sequence as a single 
result producer, with the intermediate result kept larger than register 
width and fed back to the in-sequent instruction (preserving accuracy.)
 

> This is something that is on the table as an option (an idea floated by 
> Atif from Pixilica), and one that we are sort-of looking to put into the 
> hardware of the Libre RISCV ALUs, by having a long "opcode" that activates 
> *parts* of the pipeline (pre and post FP normalisation and special cases) 
> so that it can be share between INT and FP. 
>
> Also, 64 bit will be performed by "recycling" intermediary results back 
> through the pipeline, again under the control of that microcode-like long 
> "opcode". It's a FSM with automatic operand forwarding in other words. 
>
> What you describe - the special cases that turn ATAN2 into ATAN - could be 
> performed conveniently within the "recycling" paradigm by carrying out the 
> special cases as one "cycle", the DIV as another (or the mul and the 1/x as 
> two) and finally the FSM hands the intermediate over to ATAN. 
>
> The nice thing about this microarchitecture is that the intermediate data 
> can be of any width, as well as contain any number of intermediate 
> operands. 
>
> My feeling is - and this is not ruling out the possibility - that 
> microcode ops, exposed to the actual ISA level - would not only need a lot 
> of thought, they'd need special attention to be paid to the register file 
> (no longer 32 bits, it would be 36 or some other arbitrary width sufficient 
> to store the intermediary results, efficiently), and more, as well. 
>
> Complicated, and also concern at deviating from RISCV's ISA, 
> significantly. Maybe even *increasing* the number of opcodes, due to 
> fragmentation of specialist micro operations (such as ATAN2 specialcases). 
>
> If those specialcases were done as RISCV operations, that's a *lot* of 
> instructions to trade off against simply having ATAN2. 
>
> Overall then I think what I am talking myself into is support for the 
> pseudo-microcode-like FSM engine within our design, with associated 
> "feedback" back to the beginning of the pipeline(s).  It is not a full 
> blown microcode design, yet has a similar effect, just without needing to 
> expose microcode details to the actual ISA. 
>
> Other implementors may choose to do things differently, particularly those 
> that stick to the UNIX Platform Accuracy profile. 
>
> So that is background. 
>
> --- 
>
> We therefore I think have a case for bringing back ATAN and including 
> ATAN2. 
>
> The reason is that whilst a microcode-like GPU-centric platform would do 
> ATAN2 in terms of ATAN, a UNIX-centric platform would do it the other way 
> round. 
>
> (that is the hypothesis, to be evaluated for correctness. feedback 
> requested). 
>
> Thie because we cannot compromise or prioritise one platfrom's 
> speed/accuracy over another. That is not reasonable or desirable, to 
> penalise one implementor over another. 
>
> Thus, all implementors, to keep interoperability, must both have both 
> opcodes and may choose, at the architectural and routing level, which one 
> to implement in terms of the other. 
>
> Allowing implementors to choose to add either opcode and let traps sort it 
> out leaves an uncertainty in the software developer's mind: they cannot 
> trust the hardware, available from many vendors, to be performant right 
> across the board. 
>
> Standards are a pig. 
>
> L. 
>