Comment #17 from Jacob Lifshay <programmerjake at gmail.com>:
If we're having multiple L1 caches, we should implement them to appear as if
there is a single L1 cache -- no extra instructions are needed to flush one
cache before accessing the other, this greatly simplifies the programming model
and Kazan currently depends on the memory having a normal programming model (no
cache flushes required for an operation to be visible on the same thread).

For GPU workloads, they still have some accesses (the majority for a tiled
architecture -- what Kazan is) that benefit from a L1 cache. basically, it
splits the screen into small squares/rectangles (bins) and accumulates a list
of rendering operations (binning) then for each square/rectangle (bin) it
renders all the operations inside it in sequence, so the pixels are repeatedly
stored and loaded until that bin is done rendering.

Texture access also benefits from a cache since nearby pixels are accessed
together and often the accesses are overlapping.

