[libre-riscv-dev] Design for test methodologies

Sun Jan 12 20:32:53 GMT 2020

Hello Staf, Jean-Paul & list,

On 2020-01-11 13:44, Staf Verhaegen wrote:
> When doing DFT for ASIC current tools rely on having flip-flops with
> scan chain support. That feature is then used to load test patterns in
> the design and test the chip.
Yes, I have studied this method at ASIME/LIP6 in 2001.
See the next part of that message for the criticism and proposed
solution to this approach.

> What I do think you propose is to not
> only use flip-flops with scan chain support but extend the
> functionality of the standard cells for DFT. Unfortunately I don't see
> how you can do this without introducing unwanted area overhead for 
> ASICs.
You can't see how it's done efficiently because indeed it's obviously
not possible and, being the speed freak I am, it wouldn't even make 
sense :-)

I do not propose to modify ASIC gates, on the contrary I start from 
there.
(which explains why I want to have access to more gates libraries than
the one for the Actel designs). They will be implemented "as is" in the 
final mask.

However I modify the gates during test & simulations and substitute them
with code with more features to enable coverage and verification (among 
others).
This lets me check that each gate has been fully tested with given test 
patterns,
or (hopefully one day) generate those test patterns.

> Given the timeline for the NLNet project DFT was not included and
> testing is planned to be done in the old way by just running test
> programs and see if they have the right output.

Fine, but how can you be sure that your program covers all the gates
and your faults-model ?

This is one of the problems that my library solves :
you can run the program on the *simulated* circuit (with mapped
and substituted gates) and you then get a histogram of how the gates
are used. This lets you tune the testing program to exercise a 
particular circuit.

     ---oO0Oo---

Note on DFT methodology and tools :

My designs don't use "internal boundary scan" like LIP6 does because 
they create more
problems than they solve, particularly if I want to test/try in a FPGA
(including increased stress on the clock network, altered timings, 
increased size of
the design, and the scan chain can't run at the full clock speed of the 
design,
unless at high prices for other circuits, among many issues I have 
found).

My approach integrates the test circuits from the start of the design so 
I can test
the tester along with the whole system, I can even create the FPGA 
circuit
to test the finished design, mapped to another FPGA. This ensures that
I have confidence in the test rig when the chips arrive from fab.

Part of the problem with the internal boundary scan is the clock and
latch enable signals, with their huge fanout and very tight timing 
constraints,
hence physical requirements that compete with the rest of the circuit
(and might decrease its overall performance).

My solution is to cut the problem in half and avoid high-constraint 
signals.
Look at the end of this article, at part 6 :
https://connect.ed-diamond.com/GNU-Linux-Magazine/GLMF-218/Quelques-applications-des-Arbres-Binaires-a-Commande-Equilibree

https://connect.ed-diamond.com/sites/default/files/articles/gnu-linux-magazine/glmf-218/83759/img13_CircuitDebugSPI_large.jpg

  - a first shift register sends data to key points of the circuit.
    It can be made as a clock-only system, low-frequency, with rippling 
clock,
    to avoid stressing the main clock network, and the few serial pins 
already
    limit the bandwidth anyway. The gates are simple DFF and some MUX can 
be
    added if needed, controlled by nearby DFFs of the chain.

  - data out are serialised by a large balanced multiplexer I have 
designed.
      My "balanced control binary tree" makes this system scalable beyond 
64 bits of input.
      Marie-Minerve has been made aware of this last year.

The whole system appears on the outside like a modified, half-duplex SPI 
port
that I can test in pure VHDL and FPGA. It can be driven by a 
microcontroller,
a RPi or whatever has a few reasonably fast GPIOs. And my gates library 
lets
me verify the effect of any given test vector and the expected output 
result.

The key to the design is where to place these elements but it's obvious 
for
the architectures I design : hijacking the instruction register and/or
the memory ports allows full access to most of the critical resources,
because I can output specifically crafted, made-up instructions, on top
of controlling the internal state machine and other minimal circuit 
behind
the curtain. With negligible impact on the DUT's size & speed.

     ---oO0Oo---

Here is another circuit test approach :

Due to capacitive effects, faults could occur at full speed and be 
hidden at
low speed, and/or you want to "bin" your chips based on their max. speed 
and/or
voltage and/or temperature and/or ...

So you want to see at which speed the circuit starts to malfunction.
You input a clock signal with frequency controlled by a test rig then...
You end up with a crazy huge database of test vectors to shove at full 
speed
into the chip. It won't work easily.

You can generate the test vectors "onchip" with a LFSR and all the 
outputs
get "mixed" with another "disturbed" LFSR to generate a serial 
signature.
The output of the LFSR is output at full speed on one pin of the chip,
and compared to the expected result. A high-speed bitstream, probably
coming out of a 25Q128 SPI Flash chip, can be easily XORed and you can
run the test for 128 million cycles before the SPI chip loops back.

The properties of a LFSR ensure that if ONE input perturbation bit
(coming from the circuits to test) is NOT as expected, the rest of
the stream will be highly uncorrelated with the expected bitstream
(one half of the bits will be wrong in average).

It's cheap, fast, quite efficient, unintrusive, low-tech, rather 
scalable,
automatable, uses few physical resources (a LFSR adds negligible 
overhead
on the chip) and my library helps designing the LFSR by checking the 
coverage,
for a given test duration. For example you can tune the width of the 
LFSR,
its polynomial and its reset value, and run scripts overnight to compare
coverage over a given design space/range. Give it a try, GHDL is free
and you can run as many instance as you like on as many computers as you
like at a time !

The only thing is fails to do is pinpoint the reason of the fault
but it's not needed for go/nogo chip tests when they're still on the 
wafer.
Diagnostic requires more sophisticated tests, which then help to 
increase yield.

     ---oO0Oo---

I hope it helps !

- These methods have been published in the past. I know of no prior art.

- Before joining ASIME/LIP6 in 2001, I was a test engineer at
Mentor/Meta Systems and helped troubleshoot Celaro emulators.
I might not be the brightest of them all but I had 20 years to
think this through since :-)

> greets,
> Staf.

All the best !
yg