[libre-riscv-dev] SV / RVV, marking a register as VL.

Sat Aug 31 15:57:20 BST 2019

On Saturday, August 31, 2019 at 1:30:53 PM UTC+1, Rogier Brussee wrote:

> The CSRs are supposed to be "pushed" at the ALUs in a one-way fashion. 
>> Things like setting the FP CSR for example. Setting a mode for arithmetic 
>> saturation and so on. 
>
> The only reason you are "supposed" to read CSRs for is in unusual 
>> circumstances, such as context switches. 
>>
>>
> You seem to be using CSR's a lot
>

well.. the original version of SV was.  the spike-sv simulator quickly 
disabused me of that idea.  jacob came up with SVPrefix (which sort-of 
SIMD-ifies scalar opcodes) and i decided to do "VBLOCK", which effectively 
strips absolutely all of the 32-bit CSR instructions and drops the 192-bit 
format header (only 16 bits) in its place.

> and pushing the boundaries anyway, 
>

yup :)

> and once you are accommodating a full vector processing unit it is safe to 
> assume we are not talking about the lowest end of processors.
>

ah.  right.  it's really important to remember that SV is _not_ just for 
high-performance vector processing: it's designed to be useful right the 
way down to the RV32E level.  at its core it's just a series of for-loops:

PC++;
for VBLKPC in range(VBLOCKLEN)
{
   for vl in range(VL)
   { 
      for subvl in range(SUBVL)
      {
          same instruction gets repeated here,
          just with different regs each time.
          actual parallelism is entirely optional
      }
   }
}

you don't *have* to have massive regfiles, or huge memory bandwidth, or 
greatly-increased register ports.

there's benefits to SV beyond vectorisation, including being able to 
save/restore the entire register file with one or two instructions [not one 
per register]. 

provision of LD/ST-MULTI was a complete accident, and can even be 
predicated, which saves on function call stack save/restore as well as 
context-switch.

so it's definitely not just about vectorisation.

>  
>
>> [snip]
>
>  
>
>> > 
>> > I don't know exactly how you have arranged things, but if you address 
>> registers in blocks, having a block of 16 registers that can be used 
>> alternatively as scaler registers with the standard instructions is useful 
>> and made more difficult if one of those registers is used as vl, 
>>
>> Not at all. We have a Dependency Matrix logic block, which tracks all 
>> read and write register hazards.  No pipeline *ever* needs to stall. 
>> evvvvverrrrr.  Once data goes in, the pipeline *knows* that there will be a 
>> place for the result to go. 
>>
>> The VL dependency tracking, which had to be in there anyway, is now no 
>> longer a "special case", it's now just another scalar register. 
>>
>> Now, that happens to be a "hidden extra operand" to all opcodes, but 
>> that's exactly how VL has to be thought of anyway: a hidden operand that is 
>> implicitly added to every [vector] instruction. 
>>
>>
> I didn't mean the internal dependency, but....
>

slightly lost.  no the registers are not addressed in "blocks" (which is a 
nice idea, btw).  they're all individually re-routeable / taggable.  it 
takes a whopping 16 bits to specify the full context (per register!) so i 
came up with a shortened 8-bit format.

i've been trying to think how to get that down further, for some time.

> It leaves compiler writers with far less flexibility. Whatever reg is 
>> picked, it creates a hole around which the use of the surrounding scalar 
>> regs *cannot be used in a vector*. 
>>
>>
> this point. The first six registers numbers are tied up by calling /usage 
> convention so a seventh is less in the way down there. 
>

hard-coding registers just makes me twitchy.  the more there are, the 
harder the interactions become to predict what harm they might do for 
compiler and assembler writers.  we already had to have some for the 
reduced (compacted) formats, and they're making me nervous.

> > Also, it is a CSR worth of state that has to be saved on context 
>> switches. 
>>
>> It has to be saved anyway. 
>>
>>
> If there is no CSR that stores which register is currently used as vl 
>  (e.g. t1) you don't have to save and restore that.
>

ok yes, now i understand.  took me a while.  *thinks*... context-switching 
isn't as high a priority as reducing code size (to get I-cache usage down 
in ultra-low-power 3D GPU scenarios) ... 

> I'd like to see RVV be similarly improved through public transparent 
>> discussions, for the benefit of all implementors and of the RISC-V Vector 
>> community. 
>
>
> Which reminds me that I have a half finished review of the B extension. 
> That draft seems to be improving but could, I think, also use more 
> eyeballs. 
>

could you make it here on isa-dev?  having the same operations that have 
been specially added to RVV for mask manipulation: we will need the exact 
same operations in BitManip.  i've been delaying a review as there's been 
so much else to do.

thanks rogier.

l.