[libre-riscv-dev] SV / RVV, marking a register as VL.

Rogier Brussee rogier.brussee at gmail.com
Sat Aug 31 13:30:53 BST 2019



Op vrijdag 30 augustus 2019 20:25:05 UTC+2 schreef lkcl:
>
> On Saturday, August 31, 2019 at 12:55:25 AM UTC+8, Rogier Brussee wrote: 
>   
> > shaving one instruction off of a 12-instruction loop is not to be 
> sneezed at, rogier!  and in SV, it's something like a reduction of 3 in 13, 
> which is a whopping 20% reduction!  one of those is on the 
> loop-critical-path (an 11% reduction) and the others are on the clean-up 
> path. 
> > 
> > 
> > For this example. 
> > 
>
> Yes. Which is "the" canonical example of sequential data-dependent 
> fail-on-first parallel processing. 
>
>
Good point.
 
[snip]


> > 
> > 
> > CSRRA would be allowed on all CSR's, just as  CSRRS and CSRRC are 
> allowed on all CSR registers, they would just not necessarily be useful. I 
> assume here that adds are more useful than anything else except what is 
> available now. Also if you go the road of trying to squeeze in the CSR/ 
> privileged opcode you have no room left for anything else :-(.   
>
> Yep. Sigh. 
>
>
> > CSRs were never intended for this kind of close-knit arithmetic tie-in.  
> you set them up, you maybe clear a bit or two, do lots of operations, and 
> then maybe set or clear a bit or two again. 
> > 
> > 
> > 
> > 
> > Right. But what is the fundamental difference between atomically 
> set/clear a bit and atomically adding and subtracting.  
>
> That's slightly missing the point: the point is that the scalar registers 
> are what you're supposed to do arithmetic on, and CSRs are what are 
> supposed to change the behaviour of the engine, run a bunch of arithmetic 
> ops, then switch it off again. 
>
> The CSRs are supposed to be "pushed" at the ALUs in a one-way fashion. 
> Things like setting the FP CSR for example. Setting a mode for arithmetic 
> saturation and so on. 

The only reason you are "supposed" to read CSRs for is in unusual 
> circumstances, such as context switches. 
>
>
You seem to be using CSR's a lot and pushing the boundaries anyway, and 
once you are accommodating a full vector processing unit it is safe to 
assume we are not talking about the lowest end of processors.

 

> [snip]

 

> > 
> > I don't know exactly how you have arranged things, but if you address 
> registers in blocks, having a block of 16 registers that can be used 
> alternatively as scaler registers with the standard instructions is useful 
> and made more difficult if one of those registers is used as vl, 
>
> Not at all. We have a Dependency Matrix logic block, which tracks all read 
> and write register hazards.  No pipeline *ever* needs to stall. 
> evvvvverrrrr.  Once data goes in, the pipeline *knows* that there will be a 
> place for the result to go. 
>
> The VL dependency tracking, which had to be in there anyway, is now no 
> longer a "special case", it's now just another scalar register. 
>
> Now, that happens to be a "hidden extra operand" to all opcodes, but 
> that's exactly how VL has to be thought of anyway: a hidden operand that is 
> implicitly added to every [vector] instruction. 
>
>
I didn't mean the internal dependency, but....
 

> > whereas a scalar use of t1 can be replaced with a temporary in the 
>  x16-x31 (e.g. t3) without any problem. 
>
> See above. SV is under a lot more register allocation pressure. 
>
>
> > 
> > 
> > 
> > It would seem the important instructions to use with vl are C.ADDI, C.LI, 
> C.MV, B.MAX and  perhaps  occasionally C.LDSP, C.SDSP, and C.ADD and SUB. 
>  Each of the RVC instructions use a 5 bit register number.   
>
> It's not the VL-related-arithmetic ops that worry me, it's that because SV 
> uses *all* scalar opcodes and contextually marks them as "vectorised", the 
> *vector* operations are under pressure if one of the regs is hard coded to 
> VL. 
>
> It leaves compiler writers with far less flexibility. Whatever reg is 
> picked, it creates a hole around which the use of the surrounding scalar 
> regs *cannot be used in a vector*. 
>
>
this point. The first six registers numbers are tied up by calling /usage 
convention so a seventh is less in the way down there. 


 

> It is just a route I do not want to go down. 
>
>
Ok your call.  

> > 
> > 
> > with all these things in mind - the VL CSR using the CSR regfile for 
> ways in which it was never originally designed being the most crucial - is 
> the idea of having VL be a pointer-to-a-scalar-reg starting to make more 
> sense? 
> > 
> > 
> > 
> > 
> > No because even if redirection is free, once the vector length is in a 
> register and not in a CSR, I don't really see what it buys you to be able 
> to set _different_ registers as the vl register, 
>
> I believe I explained that, above,


MMmmm no, but you understand this better than I do.  
 

> [snip]
> > Also, it is a CSR worth of state that has to be saved on context 
> switches. 
>
> It has to be saved anyway. 
>
>
If there is no CSR that stores which register is currently used as vl 
 (e.g. t1) you don't have to save and restore that. Of course you still 
have to save the register contents of t1.  As an analogue: on context 
switch  you have to save the stack pointer In x2, you don't have to save 
that x2 is used as a stack pointer because that convention is an invariant. 
 

> > Being able to making the vl register dependency explicit by making it 
> explicitly part of the long "specify everything version" of your 
> instructions seems "the right thing" though. 
>
> On balance... yeah. 
>
>
> > 
> > Anyway I was just giving you food for thought which it seems to have 
> done :-). 
>
> Yes, for which I am very grateful. 
>
>
You'r welcome.
 

> I'd like to see RVV be similarly improved through public transparent 
> discussions, for the benefit of all implementors and of the RISC-V Vector 
> community. 


Which reminds me that I have a half finished review of the B extension. 
That draft seems to be improving but could, I think, also use more 
eyeballs. 
 
Rogier

 
>
L. 
>
>
>


More information about the libre-riscv-dev mailing list