[Author Prev][Author Next][Thread Prev][Thread Next][Author Index][Thread Index]

[f-cpu] Verification, Testing, And Random Numbers



[I hope the mail doesn't bounce again]

Hi F-gang!

You seem to be talking about a lot of different things at the same time: a) verification of a design and/or its actual implementation (in VHDL), b) testing the generated circuit (in silicon), c) the famous F-CPU POST, and d) the use of random numbers.

For a), you have to verify that a unit does what it is supposed to do: adders shall add, multipliers shall multiply and so on. There are several methods to do that. An exhaustive test is most reliable, but takes too long: Each unary operation would take 2**64 individual tests, binary operations take 2**128, and ternary (like `mac') take 2**192 of them. But we definitely can't afford to wait years for the result, do we?

In my testbenches, I use "systematic" tests. They test only specific cases, and make use of "don't cares". E.g. if both operands of an adder have their MSBits set, you *know* that a carry-out will be generated, no matter what the other input bits are. Therefore, you can set them to 'X' at the input (and ignore the output bits). Similarly, the LSBit of the result will always be the XOR of the operands' LSBits, no matter what the other bits are. And so on...

Systematic tests are less reliable than exhaustive tests, but they're still quite good at detecting errors in a design. And they're a lot faster (but still take too long in some cases).

When using random numbers for verification, you have a general problem: If you feed the unit with random operands, you get a result that is likewise random. How can you verify that it is correct? You have to compare (that is, bitwise XOR or something like that) it with the output of another unit that is known to work correctly (so-called "golden code"). If the results differ, your code is broken.

But in general, there is no "golden" code. There is only another implementation of the same problem that may or may not be correct. You have no way to verify that, unless you use one of the methods outlined above. In that case, you can as well verify your circuit directly.

The second problem with random number tests is that its "error coverage" is too low. You only check a small percentage of the 2**64 (or more) possible input variants, but errors may hide in any of them, so you probably won't catch them all. This leads to Pentium FDIV bugs and the like.

Functional symbolic verification would be the real thing. It transforms both the operands and the result into a special representation (using BDDs oder one of their derivatives), performs the operation in either representation, and compares the results. You need only a single run per operation, and the result is as reliable as that of an exhaustive test, only much faster. But there are no tools for it (yet).

Let's come to b), testing. There's no time for doing an exhaustive test: even at 1000 GHz it would take 2**128 / 10**12 = ~3*10**26 seconds to test all variants of a single binary operation like `add'. A year has only ~3*10**7 seconds, so you would have to wait 10**19 years. Are any immortals listening here?

The usual way is to use one of the common "fault models" and build as many test vectors as you need to test for all detectable faults of a unit. But this requires that you know its actual implementation, which is not available before synthesis. Actually, this is another case of a "systematic" test which is fast but not fully reliable.

Random numbers may work for boundary scan testing as well, but still has the same problems as with verification: there is no "golden" code, and the error coverage is too low.

c) The famous F-CPU POST can't use an exhaustive test, nor can it use test vectors (there is not enough room on the chip - and, after all, errors may occur anywhere, even in the ROM that would hold the vectors). It doesn't make sense to use random numbers, because there is no way to verify the result of an operation - putting "golden" code on a chip is simply impossible.

The only way to perform a reasonable POST is to start from a well-known state, perform a well-known sequence of operations - that is, execute a "built-in" test program -, and compare the result with a well-known value. How that comparision is done is more or less a matter of taste; the LFSR approach outlined by Yann is one possible solution (reminds me of a CRC generator, by the way).

Finally, we come to d), random numbers. If you've listened carefully to what I said above, you will probably come to the conclusion that they're not useful at all for verification, testing, or POST.

Q.E.D.

Michael.

*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu in the body. http://f-cpu.seul.org/