[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

ZigZag (was Re: [f-cpu] Status quo)



Le 2015-04-01 16:19, Cedric BAIL a ÃcritÂ:
On Wed, Apr 1, 2015 at 11:25 AM,  <whygee@xxxxxxxxx> wrote:
Le 2015-04-01 11:17, Nicolas Boulay a Ãcrit :
if you have img[x][y], you want something like
img[x/64][y/64][x%64][y%64] it's not the good example because you need
the size, but that's the idea.

Here, you're just mixing address lines. Nothing crazy.
CÃdric's example seems to work at the byte level
with a more complicated pattern, that's where SIMD
is helpful to shuffle the bytes.

Yes, my example is working at 2 levels. Inside the block itself which
follow the conversion given by Nicolas and between block where it does
a Z walk over the buffer to preserve them longer in the cache. All of
that is done to actually improve cache locality.
I think I understand this part. It gets weird soon after.

The problem with the specific instruction is that it is unlikely to be
triggered by a compiler
what if you ask for it ?

and will require manual writing of the assembly code,
not necessarily, but if your compiler doesn't support the CPU's
features, why use it ?

but also will require toolkit to adopt this change in
their rendering pipeline to benefit from it. This is something very
tricky to do and most toolkit wont do it.
so in the end, you're telling me that users will never use
the feature, so it's useless to implement it.

At the opposite changing the
way user space see the memory is much more likely to be done. It will
require a change in the allocator used for image (which is already
something clearly separated) and a change in the kernel to enable that
new mapping. Both of those change are much simpler in nature and less
tricky to do, so more likely to be done.

Again, you're being too vague, too fast.

The inter-block pattern is managed by the allocator, ok.
Then how do you define that a pointer must have its LSB mangled ?

Given that there are several ways to mix the LSB, it has no place
inside the CPU or directly on its address bus. And since it's
a problem that is specific to GPU, why isn't it possible
to manage it on the GPU side's bus ?

I am not dismissing the LUT instruction here, it is useful in itself
for other task, but just a reminder that if this require massive
change in the existing software, it won't be used.

Well, the F-CPU won't be used if it doesn't work, which is
a higher priority :-D We won't even have a GPU before a while
so there is no rush in shuffling address bit shuffling for specific purposes.

yg
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/