Living with failure: Lessons from nature?

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The resources available on a chip continue to grow, following Moore's Law. However, the major process by which the benefits of Moore's Law accrue, which is the continuing reduction in feature size, is predicted to bring with it disadvantages in terms of device reliability and parameter variability. The problems that this will bring are underlined by the predictions from an Intel commentator: within a decade we will see 100 billion transistor chips. That is the good news. The bad news is that 20 billion of those transistors will fail in manufacture and a further 10 billion will fail in the first year of operation. What does a 20-30% device failure rate mean for designers and what does it mean for production test? As a designer, I have some idea how to design for very low device failure rates. Redundancy, fault-tolerance and ECC are all approaches that can cope with very low failure rates. The basic assumption is that faults are infrequent so we only have to cope with one at a time. But a 20-30% failure rate will clearly violate this assumption, and the bottom line is that I have no idea how even to begin to design useful circuits that can cope with this level of failure. For an example of a functional device that can cope with this level of failure, we have to look to nature. Brains can cope with very high levels of neuron failure. But we have no idea how they work, let alone how they keep working after these failures. What might we be able to learn from biology about building systems that continue to function as components change and fail? Will manufacturing test change from being primarily about checking that every device on the chip works to checking that enough devices are working to ensure that the chip functions correctly and is likely to continue to do so even after many more devices have changed or failed over the early operational life of the chip? In this talk, I will describe a proposed chip multiprocessor system that is being developed primarily to help understand how the brain works, but which will also present the sorts of challenges that will increasingly dominate the future of production test. The chip does not need to be fully functional to be useful, so how can the production test establish that enough works for the chip to be useful, even after further early-life failure? © 2006 IEEE.

Bibliographical metadata

Original languageEnglish
Title of host publicationProceedings - Eleventh IEEE European Test Symposium, ETS 2006|Proc. Eleventh IEEE Eur. Test Symp.
PublisherIEEE Computer Society
Pages4-5
Number of pages1
Volume2006
ISBN (Print)0769525660, 9780769525662
DOIs
Publication statusPublished - 2006
Event11th IEEE European Test Symposium, ETS 2006 - Southampton
Event duration: 1 Jul 2006 → …
http://dblp.uni-trier.de/db/conf/ets/ets2006.html#Furber06http://dblp.uni-trier.de/rec/bibtex/conf/ets/Furber06.xmlhttp://dblp.uni-trier.de/rec/bibtex/conf/ets/Furber06

Conference

Conference11th IEEE European Test Symposium, ETS 2006
CitySouthampton
Period1/07/06 → …
Internet address