GPUÂ use a specific memory layout "in Z" to favor locality access and caching. This is simple "tiling". When you copy this kind of image between cpu and gpu memory, the pixel must be moved to the right place. And it's slow.
Tiling is a good also for cpu code on image. Using tiling "by hand" in cpu, is a pain in the ass. The asm code is too ugly.
So Cedric ask for a MMU flag to say to use the same memory layout in tile for CPU and GPU, : you don't need to change pixel place during the copy.
It could be very interresting if memory are shared between gpu and cpu : no copy at all will be needed.
One more question from my part : Is it possible to always do 2D tiling memory layout compatible with GPU, to avoid the flag ? It's only a different way to read the memory.