aspects on failure modes and reliability assessment in ......failure analysis: bond process...
TRANSCRIPT
Gerald Dallmann
Aspects on failure modes and reliability assessment in automotive power microelectronics
Division Manager, SGS INSTITUT FRESENIUS GmbH
Outline
• Main field failure mechanisms from a perspective of a service provider
• Reliability assessment and limitations of current qualification procedure
• Conclusions and recommendations
Failure modes in power devices
Market trend per device type
• Downturn in 2012 due to PV
• Current stable growth due to:
• E mobility (drives, steering…)
• Industry automation (I 4.0)
• Chargers, power supplies (IoT)
• Highest market share for IGBTs
Source: Yole Developpement
Silicon power devices. IGBTs
• Despite the fact that IGBTs are just switches, they are complex and sensitive products with much more failure modes than logic IC‘s or mechanical switches which are replaced by IGTBs.
• Besides electrical issues (overstress: V, I, P, T, punch through, latch up, thermal runaway, dynamic issues L*dI/dt, …) and systematic system issues (cooling, parasitics, system oscillations…) there are many technology issues and process deviations which can’t be simulated.
A. Benmansour et al. Trench IGBT failure
mechanisms evolution with temperature and gate
resistance under various short-circuit conditions.
Microelectronics Reliability, Elsevier, 2007, vol.47,
pp.1730-1734.
Source: Infineon
Oscillation issues due to high stray inductance
Source: Infineon
See: FUJI IGBT MODULE APPLICATIN MANUALhttps://www.fujielectric.com/.../igbt/application/box/.../REH984c..
Example for failure mode root cause finding
ESD/EOS• ESD (electrostatic discharge) is caused by storage or handling issues and leads to very small burn
marks in the silicon.
Can be caused at each station in the supply chain.
• EOS (electrostatic overstress) is caused by U, I or P stress outside the SOA (safe operating area) and leads to severe thermal chip/ module damage.
Caused mostly at final application or test at application conditions.
• The differentiation is fuzzy!
Au
ESD of a Power MOSFET
EOS of a freewheeling
diode
Top down view
X-section
Si chip
Backside solder
Trench
Contact
ESD vs. EOS
Examples of main Failure mechanisms.Particles and patterning issues.
• Main root cause for current field failures.
• In-film particles and patterning issues lead Poly-Si and Metal residues and can result in unstable shorts reaching the final customer.
• FA requires analysis of signal paths and networks to identify a potential short.
Au
Poly Si residue leading to a tiny short Poly-Si plate to substrate contact.
„b“ „a“
Failure network
analysis
OBIRCH Failure
localization
Failure
hypothesis
Gate Oxide Defects
• FA can pinpoint to the location of the defect but the process root cause has to be found by other methods:
• Correlation to wafer process deviations, tools, plasma etch or other tool issues
• Reliability monitoring data
• Cooperation with the wafer manufacturer is required (bath contamination, charging)
Au
Gate
oxide
area
GOX defect at the boarder to the
source/drain region.
Electrical field stress?
Plasma damage?
Local thinning?
MIM Capacitor defect
after Poly 2 removal.
Wafer process induced
electrostatic damage.
Localization by LITLight microscopy after
top layer removal
BE Failure root cause: Bond pad surface and bond process
• Si wafers are manufactured in FAB, tested, stored, shipped
• At packaging sites wafers are sawn into chips. Chips are soldered, cleaned, bonded, tested.
• Power devices require thick Al wires (300µm), bonded to 5µm Al layer on the Chip. This leads to very narrow process windows (no sticking vs. mechanical damage of the chip).
• Wafer FAB process, storage, transport, cleaning after sawing and soldering and bond process issues are typical root causes for power device failures.
• They lead finally to a burned chip, indicating (misleading) to EOS as root cause.
SEM image of a cross section
Al wire bond to a Chip with
Al metal layerAl Bond wire
300µm Ø
Al layer of chip metallization
5µm thick
Failure analysis of power devices
• Root cause analysis by failure localization and cross sectioning
• Result: EOS (Electrostatic Overstress), typically related to the application/ customer
• Cracks due to short intense heating pulse
Burn mark close to
a bond wire
Cross section showing
the molten Si chip and
cracks in the chip
Bond wire
Si chip
Solder layer
Cu from DCB
Molten Si Chip
and Solder
Failure analysis of power devices
• Root cause analysis: EOS ???
Crack seen also in
the reference chip
Crack in the ceramic layer
of the DCB
Cu layer
Ceramic layer
Failure analysis: Bond process (tool)
• The faulty device (leakage) has to be de-capsulated (incl. silicone gel!) and analyzed for failure localization.
Light Microscopy
Image of a power
deviceLocalization of the
failure by LIT (Look-in
Thermography)
Failure analysis: Bond process (impurities, tool)
• The bond wire (300µm) Al has been fully pressed through the Al of the Chip (5µm thick).
• The bond force was too high, required by surface impurities of the Chip.
SEM of the bond SEM of the cross section
Si Chip
Al bond wire
Al Chip
Bond tool pressed
onto Chip
surface!
Failure analysis: Bond process (impurities, tool)
• Same Chip type with a burned fail, leading to molten Al and SiO2 spheres.
• The burning position slightly outside the bond could be occasionally, leading to (wrong) EOS as failure hypothesis.
SEM of the bond
SEM/EDX analysis of
the molten area
Failure root causes: bond surface contamination
• Al bond pad analysis by AES(Auger Electron Spectroscopy), left, andTOF-SIMS (time-off-flight Secondary Ion Mass Spectroscopy), right
• Pad contamination by residues of F, typical for plasma etch processes
Al C O F Si
24at% 33at% 22at% 12at% 9at%
Analysis of surface cleanliness by TOF-SIMStime-of-flight Secondary Ion Mass Spectroscopy
Analysis beam, low dose surface probing
+ Second (primary) beam,
high dose, Cs, O2, …
for depth profiling
Parallel detection of Masses
Sequential sputter erosion
High temp solder issues• Au/Sn based solders are used for high temp applications (lightening)
• The temperature stability, void formation and barrier stability depends on a precise control of process and material specifications
• FA requires in-depth knowledge on material properties
Ge carrier chip with Au/Snsolder on an Ni plated base.
Voids lead to thermal coupling issues.
High temp solder issues• Phase formation in these systems is complex.
• Phases can change during ageing even at low temperatures!
• FA requires analysis on process windows and ref. samples.
H.G.Song et al., Au-Ni-Sn Intermetallic Phase Relationships…, JoEM, vol30, p.409
Au NiSn Au NiGe
New Material combinations
• New material combinations (Cu -Al, Au based high temperature solders, Ag, metal barrier layers like NiV) require in-depth understanding of material properties and failure mechanisms.
• A phase diagram is valid for steady state conditions! Phases can change even at room temperature.
• There are no phase diagrams for 4 or more elements!
M. Paulasto-Kröckel, CAM Workshop Halle, April 2015
Reliability Assessment
Common quality understanding. Some myths.
• „My product is qualified, so I’m safe” Qualification is done typically for a new technology and one lead product (or very few products), not for every product.
Qualification doesn’t consider every application condition, mostly just T and V. Real application happens in systems (voltage spikes, parasitic’s like inductances, capacitances, contact or screening issues, humidity, mobile ions).
Qualification is based on acceleration factors for ageing mechanisms (e.g. device degradation). Process issues (surface contamination, particles) are not included in these models.
Qualification is done with 77 parts for some tests. It is just a catastrophe check, not capable to show low ppm failure rate.
Qualification doesn’t consider excursions. They are assessed by engineering (manual) judgment.
Power devices are often used outside specification (e.g. for unclamped inductive load).
Real quality level can be assessed after qualification by analysis of test failures and field returns.
• “The technology was not changed over years and stays as it was qualified” Technology qualification is done when the technology is mostly ready. The yield is not yet at the final (mature) level. Yield increase
work requires process changes.
Electrical tests are introduced to catch bad/weak chips during qualification. Test costs are huge. Test reduction is required for cost savings. Quality has to be maintained all the time.
Each manufacturer has a severe pressure for cost reduction. Materials and suppliers are changed. Quality impact is assessed by engineering judgement or some requalification tests.
Technology has to be changed. Talk to your supplier, without focus on potential claims or issues!
Reliability requirements for automotive applications
• Zero ppm failure rate
Typical acceptable failure rate for automotive is today <10ppm (for traction business <200ppm)
This leads to a typical number of fails of 1 to 2 cases per product and customer per year. Each case could be unique.
Higher losses with repeated fails lead to a Task Force operation mode.
• Zero Incident
No excursions, no process adjustments for entire shipment time frame.
Not realistic (changes of equipment base, suppliers, production excursions).
Limitations of current qualification and quality monitoring procedure
• The main technology and product qualification procedure is still following the AEC Q100 procedure
• „Components meeting these specifications are suitable for use in the harsh automotive environment
without additional component level qualification testing.” http://www.aecouncil.com/AECDocuments.html
• Issues
Very limited number of tests and samples (~500 in BE)
Fail rate assessment at end of life requires proven acceleration models for Q100 qualification
procedure! No models for new packaging types!
Field fails on qualified products
http://www.springerprofessional.de/aec-qualifikation-ist-kritisch/3683304.html
• To prove 2ppm failure rate about 1 to 2 million devices should be tested- at each test!
• Degradation acceleration and models are required.
Strategies to prove and improve reliability atlow ppm level. Production part.
• Quality has to be produced. It can not be achieved by testing!
Find a robust system design
• Application conditions, mission profile
• Process window lots (metal barriers, solder process, contamination)
• FMEAs. Check if the failure modes happen by FA.
• Prove the main critical system and device properties and materials by analysis in the concept phase.
• Requires FA, also after stress application.
Strategies to prove and improve reliability atlow ppm level. Production part.
• Quality has to be produced. It can not be achieved by testing!
Reduce fail root causes in production
• Reduce defects as one of the main failure root causes (particles and patterning issues).FA to find the mechanisms of defects and address them.Inline inspection by bright field pattern comparison.Current focus: wafer bevel and backside inspection.
• Reduce excursions (tool malfunction, misprocessing, CL violation)
• Run stable and centered processes (SPC, cp/cpk)
• Check main and critical device properties by FA on a regular base.CA (construction analysis): measure the entire geometry (gate, contacts, metal, pad, pad opening, passivation, …). Check with Trendcharts.Very valuable also for foundry customers (check critical dimensions).
• Feedback of FA of field returns to improve production quality.
Strategies to prove and improve reliability atlow ppm level. Test part.
• Extend SQ test (samples, time, conditions) to increase the chance of a fail and combine relevant
stress conditions
• Overtesting/ test with tighter SPECs (leakage currents,
UIL= unclamped inductive load for power MOSFETs)
• Burn-in. Stress of the devices (1st year of application) before test.
• FA of electrical parameter and functional test fails.
Every fail mechanism will still occur also in the field!
• Analysis of field returns by FA and feedback to process.
• Reliability monitoring. Test of critical parameters on production lots. FA of detected failures.
• Analyse hardware after SQ tests by FA. Detect weaknesses.
Strategies to prove and improve reliability atlow ppm level. Test part.
• Robustness validationsee http://www.zvei.org/Publikationen/Robustness-Validation-Semiconductor-2015.pdf
• Change from AEC Q100 test based approach to a predictive approach with failure mode assessment and degradation
models
• Key is the knowledge of failure mechanisms and degradation models
• Intensive FA required to
• Find the mechanisms and degradation (based on stress tests)
• Analyse EoL failures
Strategies to prove and improve reliability atlow ppm level. Test part.
• Robustness validation procedure
requires to find failure modes first
to define ageing models.
• They can be defined only by FA!
• Example GOX hard BD:
surface roughness
contamination
ESD
lattice defects
charge trapping
local GOX thinning
variation of oxide
thickness
mobile ions
dielectric defectivity
See: http://www.zvei.org/Publikationen/Robustness-Validation-Semiconductor-2015.pdf
Acknowledgement: The project eRamp is co-funded by grants from Germany, Austria, Slovakia and the ENIAC Joint Undertaking. It is coordinated by Infineon Technologies Dresden GmbH
Modelling for Verification ofHigh Reliability
Increasing Requirements for Verificationof High Reliability
• Zero failures cannot be verified statistically with finite sample size
• Consideration of reliability models required
Risk of insufficienttesting
• Partners: Infineon, JOANNEUM RESEARCH, CIS (consulting in industrial statistics)
Required Sample
Size
2 Extreme Cases Case 1: no knowledge / assumption Case 2: acceleration & ageing known
Reliability target 1 – R(t) = 10 ppm
Confidence level 90 %
Acceleration factor unknown, assumed as 1 (i.e., customer equivalent testing)
15
Lifetime behaviour No assumption Weibull with shape 3 (Ageing)
Statistical sampling model Binomial WeiBayes
Required samples for verification > 200.000 69
Use of Prior Knowledge
Required Sample Size
SGS INSTITUT FRESENIUS
AT A GLANCE
CRS Microelectronics & Special Analytics
Physical and chemical analytical techniques
D-SIMS Cameca ims 7f
AES Phi 670
XPS Quantum 2000
Quantera II
TOF-SIMS Ion –TOF 5
• Materialography, microscopy
• Scanning electron microscopy (SEM)
• Atomic force microscopy (AFM)
• Transmission electron microscopy (TEM)
• Computer Tomography CT
• X-ray photo electron spectroscopy (XPS)
• Auger electron spectroscopy (AES/SAM)
• Electron probe micro analysis (EPMA – WDX))
• Secondary ion mass spectrometry (SIMS)
• Spreading resistance profiling (SRP)
• Infrared spectroscopy (FTIR, KBr-press technique)
• X-ray diffraction (XRD)
• Thermal analysis (DMA, DSC, DTA/TG)
• Gas-chromatography mass spectrometry (GC-MS, TDS, Pyrolysis)
• High-performance liquid chromatography(HPLC)
• Ion chromatography (IC)
• Element analysis (ICP OES)
• …
INSTITUT FRESENIUS GmbHPart of SGS
Summary
• In case of power devices the customer can not find the root cause in front end and
back end.
• It needs a tight and open cooperation with the manufacturer.
• Financial claims inhibit every cooperation!
• In-depth failure analysis is required to detect and understand failure mechanisms
and customer returns.
• Every known method for chip qualification needs expert know how for failure
modes and degradation models to predict a failure rate.
This has to be delivered by stress tests and following FA.
• FA is key to understand your materials, processes and
products!