[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]
Re: CRC (was Re: [f-cpu] F-CPU architecture...)
hi,
Michael Riepe wrote:
Hi F-gang,
Yann Guidon wrote:
2) CRC is not considered as an instruction for F-CPU
(there are too many versions, so that would be very troublesome for
no almost benefit)
Well, there is a general formula.
But many variations, beyond the basic issue of the Poly.
direct/reversed ? bit-reversed poly ? Init value ? final XOR ?
But that would require that the crc is computed bitwise, which would
take too long.
well, in the case the poly is provided by the user.
if the poly and all the parameters are hardwired, then it is much easier.
A classical algo for CRC would be quite slow on FC0
(maybe 10 cycles/byte) because of the "slowness" of the cache memory system,
which is designed for throughput rather than access time (which is quite
uncompressible anyway, given the core's complexity).
It could handle at most two or three bits per F-CPU pipeline stage -
and we need to support at least 32 bits. A carefully designed software
implementation may actually be faster. Does anyone care enough to try
it out?.
as we see further, a HW version is interesting, but not in the core itself
because of the inherent slowness. And it's difficult to pipeline without
making it unpractical.
However, putting a CRC32 in the "DMA/blitter engine" would be way cool
(that would require a 32-bit field in the block descriptor with
additional flags
like "set/verify"; "irq on error", "valid/invalid CRC" etc.)
You mean, one could DMA some data to /dev/null
if data is to be read only, why "write" them and consume useless cycles ?
adding a simple "verify only" flag is enough and 2x faster (no write cycle).
and let the engine calculate or verify the CRC? Then we don't need a
CRC instruction. :-)
certainly.
the most usual CRC32 implementation is the Ethernet one, it is used
almost everywhere.
Now i wonder how it could handle more than 1 or 2 bytes per cycle (the
memory paths
are certainly faster than that).
I/O bus transfers should always include a checksum for verification
anyway. But on the other hand, that need not be a programmable CRC,
which makes its implementation a lot easier.
exactly.
so if we can program DMA blocks wich are CRC32-protected with automatic
retry,
even more complex NUMA architectures (with complex Xbar network) become
possible
(imagine something like a Myrinet-enabled CPU core, or even a T3E-like
architecture).
but we have yet to find a way to "protect" 64-bit words faster than
8-bit at a time.
Normally, SEC/DED is used for each word, which can be computed in // and
easily
pipelined (there is no word-to-word dependency). But ECC is quite
complex :-(
i really wonder how it is possible to compute or check the syndrome in
one cycle.
YG
*************************************************************
To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/