[libre-riscv-dev] load/store execution queue idea

Fri May 1 03:53:59 BST 2020

I filled out some notes on my load/store execution queue idea here:
https://libre-soc.org/3d_gpu/architecture/alternative-design-idea/

The design should be suitable for the final 28nm SoC and should be
able to execute 4 loads or 4 stores or 4 AMOs or 4 fences per cycle,
completely adjustable to some other number if we desire. This totally
replaces the memory dependency matrix. One downside is it doesn't
support forwarding from stores to later loads without going through
the L1 cache.

There's also a section on generalizing the carry look-ahead networks
to be usable for any associative binary operation (the prefix-sum
section).

Jacob