[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: test (was: Re: Rep:Re: Re: [f-cpu] No latches, please !)



hi !

the discussion about the register set has forked to this of
testing the chip. If this is to continue, we will have to
agree on common definitions that will ease the discussion.

Testing is performed during chip power-on (on-site) when
it is not a FPGA (which has a technology-dependent verification
and must be shipped without any defect). This is also
the BIST that initialises the chip to a known state
(empty TLB entries, empty cache, register set zeroed...).
Because a reset can be forced in "real-time" constrained situations,
it should last during a specified time, usually not more than a fraction
of second : from a few thousand cycles to millions of cycles,
which at 50MHz requires less than a second. A Pentium CPU
takes 5 millions cycles IIRC, mainly because the microcode ROM
and all the complex features must be tested. I hope that F-CPU will
not reach this level.

Testing is also required in another situation : during fabrication,
just before the dies are packaged. As Kim mentions, it's a cost-sensitive
place because each second the test equipment is used must be paid
(i don't remember the prices however). Given a wafer whith 100 dies
and if the tester uses 1 second to move from one dies to another,
it is already 1 minute and 40 seconds to pay. correct me if i'm wrong.
The tester must verify that the pads are working at the specified
frequency, power is correctly applied, the clock tree works...
up to the point where it is certain that the chip is valid.
There are different means and fortunately some are parallelisable.
This become critical when the CPU is part of a larger chip with
peripherals and other devices that contain their own BIST.


Scan chains are simple and easy means to test a circuits but
 - the FF cells are larger
 - the testing time is somewhere near O(n^2) for n FF
 (the testing time for a N I/O circuit is roughly proportional to N
  and a scan chain requires N cycles to be replaced)
So it is only good for specific parts of the circuit where other
means to bring data and verify it do not exist. Whenever there
is a simple and fast way to test a part of the circuit, it must be used,
particularly because generating the "vectors" of the scan chain
is complicated and these vectors must be stored somewhere (more surface).
If the scan vectors can't be stored onchip, the external tester
must provide them and the test can't work at full circuit speed
(there is a risk that high-frequency specific errors [such as
those due to crosstalk] can't be detected).

Using LFSR is another possibility for faster testing. It still requires
modified FF for initialising the state and switching between test/functioning
mode and the fault coverage is high with no vector ROM. However it's not
easy to tune and depends on the used technology. There are tools for this
but usually a good hand-designed LFSR can do very good results.
Testing time however can take a similar time as a simple scan chain
because the LFSR must explore enough states (goes as 2^N). The problem is
to know when enough states are explored, how many cycles are necessary until
100% coverage is reached. Once again, correct me if i'm wrong.
The other advantage is that because the test "vectors" are internally
generated, there is no need to provide them through an external test interface
and the freqency can be the real operating frequency, thus catching errors
that could not be detected at lower frequencies.


The previous means are used for logic stages. For example, a 16-bit LFSR
can be used to test the opcode part of the instruction decoder and a signature
is collected with another (much larger) LFSR at the other end. When the
signature does not match, there is an error which is forwarded to the tester
or the rest of the circuit which does not boot.

For non-boolean stages, such as storage cell, another method must be used
because we can't implement a scan chain inside the cells of the register
set or the caches ! the circuit surface would bloat and it would thus run
slowly.

* For the execution pipeline, the BIST unit "hijacks" the control of the
scheduler and the control signals are autogenerated and sent to the apropriate
units. It works with full width vectors (64 bits at once) and we can "chain"
several units together.

* For the memory arrays, another method is useful, which i used already.
The vectors are very easy to generate and scales well with the cell number
(O(log2 N)) but we are limited by the bus width so a multicycle unit is necessary.
For the register set, for example, we need 2*(63*Log2(63*64)+1)=1512 R+W cycles
per write and read port. This can be reduced by not sending identical/duplicate vectors
because since the access bus is not 4032 bits wide, it would send duplicate data.
Remark that this method catches ALL coupling between any wire, on top of the
usual stuck at 0 or 1. The same method can be applied to the caches.
Another remark is that the result of a read, instead of being sent back
to the BIST unit, can be chained with other units before the usual verification
in a MISR.

The BIST unit generates the vectors and the control signals that allow these
tests to occur in parallel when necessary, but always at full operating speed.


Kim Enkovaara wrote:
> > Concerning the test, remember that there is a BIST unit that
> > takes control of the whole datapath and takes care to test
> > all the execution units and the memory arrays. If the "latch"
> > (those you like and/or those you don't) is in the datapath,
> > it -will- be tested. Those that can't be tested without complex
> > stuff will use a classical scanpath but the scan chain will be
> > kept as short as possible.
> > By the way, one of the optional units (popcount) will certainly
> > be used for signature compaction.
> 
> How many patterns have you tought to use. Datapath style BIST takes very
> long times, probably many days. Each second in ASIC tester costs real
> money. You need very quick results in ASIC tster to notify if the chip is
> OK (IDDQ tests, ATPG patterns, quick RAM bists etc.)

The above rant partially answers your questions.
Remember that i have designed a testing equipment for Mentor Graphics ;-)
And i currently have testing courses and practice in the university.
i know it's not directly related but the methods i will integrate in the
BIST unit are a sophisticated version of the "logarithmic" test method
that is used for routing equipment : exhaustive, simple and fast.

In order to speed the fundry tests, parallel verifications have to take place :
first the circuit is powered (+ clocked) and the BIST runs internal verifications while
the tester checks the IO pads. a specific "test" pad must be used to decouple
the pads in order to use a parallel test path.
When it's ok (half a second), the circuit must verify that it can access
the outside pins, so the tester starts to communicate with the circuit,
uploading executable code and data in the memory. The code starts, executes
additional tests (i guess it's necessary) and the tester can act as a "swap"
for the code chunks that don't fit in the cache.

One thing to keep in mind is that the tester, for cost reasons, might
not provide as many pins as there are on the chip and the tester speed
is probably lower than what the tested circuit can run. For example an
old, second-hand or cheap tester can provide only 64 I/O pins
and work at 10MHz while the chip could run at 100MHz (just an example,
not reality ;-D). So the power is applied on the chip and the PLL starts :
it must be configured to have a 10x ratio with the external clock,
then the BIST starts. Only the power planes are connected, as well
as a few control signals, so the interactivity is reduced and a lot
of things rely on the internal ressources of the chip.

Usually, when the number of pads is larger than what the tester
can do, several passes are done on different sections of the circuit.

> > How much do i have to emphasize on this ?
> > Yes, this will be custom-designed test but
> >  - we don't need ATPG
> >  - we couldn't afford it anyway
> >  - always trust your nose and make meaningful measurements.
> 
> I doubt that you can make the BIST to cover the whole chip without
> sacrificin performance and the test times will be huge.
i don't want to make any sacrifice :-) 

> Also "all" the
> commercial Logic BIST testers work by hooking into the scan chains and
> they generate patterns to those chains. Also to minimise the test time
> Logic BIST needs quite complex things (reseedable pattern generators,
> randomness distribution changes based on feedback, random generator
> optimization based on scanpath and design etc.)

argh.

> BIST is only a small part of testing. It is good for memories and
> fullspeed logic testing but to get >95% coverage for logic with BIST is
> very challenging.

i know but fortunately we're still at an early stage of the design
so it's still time to correct wrong design techniques.
Of course, i count on your expertise to detect our errors :-)

> --Kim
WHYGEE
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
*************************************************************
To unsubscribe, send an e-mail to majordomo@seul.org with
unsubscribe f-cpu       in the body. http://f-cpu.seul.org/