[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

Re: CRC (was Re: [f-cpu] F-CPU architecture...)


Michael Riepe wrote:

Hi F-gang,

Yann Guidon wrote:

2) CRC is not considered as an instruction for F-CPU
(there are too many versions, so that would be very troublesome for no almost benefit)

Well, there is a general formula.

But many variations, beyond the basic issue of the Poly. direct/reversed ? bit-reversed poly ? Init value ? final XOR ?

But that would require that the crc is computed bitwise, which would take too long.

well, in the case the poly is provided by the user. if the poly and all the parameters are hardwired, then it is much easier.

A classical algo for CRC would be quite slow on FC0
(maybe 10 cycles/byte) because of the "slowness" of the cache memory system,
which is designed for throughput rather than access time (which is quite
uncompressible anyway, given the core's complexity).

It could handle at most two or three bits per F-CPU pipeline stage - and we need to support at least 32 bits. A carefully designed software implementation may actually be faster. Does anyone care enough to try it out?.

as we see further, a HW version is interesting, but not in the core itself
because of the inherent slowness. And it's difficult to pipeline without making it unpractical.

However, putting a CRC32 in the "DMA/blitter engine" would be way cool
(that would require a 32-bit field in the block descriptor with additional flags
like "set/verify"; "irq on error", "valid/invalid CRC" etc.)

You mean, one could DMA some data to /dev/null

if data is to be read only, why "write" them and consume useless cycles ? adding a simple "verify only" flag is enough and 2x faster (no write cycle).

and let the engine calculate or verify the CRC? Then we don't need a CRC instruction. :-)


the most usual CRC32 implementation is the Ethernet one, it is used almost everywhere.
Now i wonder how it could handle more than 1 or 2 bytes per cycle (the memory paths
are certainly faster than that).

I/O bus transfers should always include a checksum for verification anyway. But on the other hand, that need not be a programmable CRC, which makes its implementation a lot easier.

so if we can program DMA blocks wich are CRC32-protected with automatic retry,
even more complex NUMA architectures (with complex Xbar network) become possible
(imagine something like a Myrinet-enabled CPU core, or even a T3E-like architecture).

but we have yet to find a way to "protect" 64-bit words faster than 8-bit at a time.
Normally, SEC/DED is used for each word, which can be computed in // and easily
pipelined (there is no word-to-word dependency). But ECC is quite complex :-(
i really wonder how it is possible to compute or check the syndrome in one cycle.


To unsubscribe, send an e-mail to majordomo@xxxxxxxx with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/