System and method for determining endpoint in etch processes using partial least squares discriminant analysis in the time domain of optical emission spectra

0Associated
Cases 
0Associated
Defendants 
0Accused
Products 
25Forward
Citations 
0
Petitions 
2
Assignments
First Claim
1. A method creating a predictive model of an event in an etch process using partial least squares discriminant analysis (PLSDA) comprising:
 collecting calibration data for an etch process;
using at least a portion of the calibration data, identifying a feature associated with the event;
creating a predictor matrix, wherein the predictor matrix includes data for the feature associated with the event;
creating a response matrix, wherein the response matrix is comprised of a first discriminate variable value for the feature associated with the event; and
finding a predictive model for the event by regressing the response matrix and the predictor matrix using PLS regression.
2 Assignments
0 Petitions
Accused Products
Abstract
The present invention is directed to a system, method and software product for creating a predictive model of the endpoint of etch processes using Partial Least Squares Discriminant Analysis (PLSDA). Calibration data is collected from a calibration wafer using optical emission spectroscopy (OES). The data may be nonperiodic or periodic with time and periodic signals may be sampled synchronously or nonsynchronously. The OES data is arranged in a spectra matrix X having one row for each data sample. The OES data is processed depending upon whether or not it is synchronous. Synchronous data is arranged in an unfolded spectra matrix X having one row for each period of data samples. A previewed endpoint signal is plotted using wavelengths known to exhibit good endpoint characteristics. Regions of stable intensity values in the endpoint plot that are associated with either the etch region or the postetch region are identified by sample number. An Xblock is created from the processed OES data samples associated with the two regions of stable intensity values. Nonperiodic OES data and asynchronously sampled periodic OES data are arranged in a Xblock by one sample per row. Synchronously sampled periodic OES data are arranged in the Xblock by one period per row. A yblock is created by assigning a discriminate variable value of “1” to OES samples associated with the class, i.e. the etch, and assigning a discriminate value of “0” to all samples not in the class, i.e. the postetch. A bvector is regressed from the X and yblocks using PLS and is used with the appropriate algorithm for processing realtime OES data from a production etch process for detecting an endpoint.
32 Citations
View as Search Results
Optical emission analyzer  
Patent #
D618570S1
Filed 08/05/2009

Current Assignee
Shimadzu Corporation

Sponsoring Entity
Shimadzu Corporation

AUTOMATED MODEL BUILDING AND BATCH MODEL BUILDING FOR A MANUFACTURING PROCESS, PROCESS MONITORING, AND FAULT DETECTION  
Patent #
US 20100057237A1
Filed 09/02/2008

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

Use of modeled parameters for realtime semiconductor process metrology applied to semiconductor processes  
Patent #
US 7,695,984 B1
Filed 04/20/2006

Current Assignee
Pivotal Systems Corporation

Sponsoring Entity
Pivotal Systems Corporation

Selfcorrecting multivariate analysis for use in monitoring dynamic parameters in process environments  
Patent #
US 7,809,450 B2
Filed 07/05/2006

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

System and method for process monitoring  
Patent #
US 7,606,681 B2
Filed 11/02/2007

Current Assignee
Air Products and Chemicals Incorporated

Sponsoring Entity
Air Products and Chemicals Incorporated

Manufacturing process end point detection  
Patent #
US 7,630,786 B2
Filed 03/07/2007

Current Assignee
MKS Instruments Incorporated

Sponsoring Entity
MKS Instruments Incorporated

System And Method For Process Monitoring  
Patent #
US 20080109090A1
Filed 11/02/2007

Current Assignee
Air Products and Chemicals Incorporated

Sponsoring Entity
Air Products and Chemicals Incorporated

Manufacturing Process End Point Detection  
Patent #
US 20080221720A1
Filed 03/07/2007

Current Assignee
MKS Instruments Incorporated

Sponsoring Entity
MKS Instruments Incorporated

Method and apparatus for classifying manufacturing outputs  
Patent #
US 20070129836A1
Filed 12/02/2005

Current Assignee
MKS Instruments Incorporated

Sponsoring Entity
MKS Instruments Incorporated

Method and apparatus for detecting endpoint  
Patent #
US 7,297,560 B2
Filed 10/31/2003

Current Assignee
Tokyo Electron Limited

Sponsoring Entity
Tokyo Electron Limited

Method and apparatus for detecting endpoint  
Patent #
US 20060037938A1
Filed 10/31/2003

Current Assignee
Tokyo Electron Limited

Sponsoring Entity
Tokyo Electron Limited

Multivariate control of semiconductor processes  
Patent #
US 20060111804A1
Filed 09/16/2005

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

Multivariate control of semiconductor processes  
Patent #
US 7,151,976 B2
Filed 09/16/2005

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

Method and apparatus for classifying manufacturing outputs  
Patent #
US 7,313,454 B2
Filed 12/02/2005

Current Assignee
MKS Instruments Incorporated

Sponsoring Entity
MKS Instruments Incorporated

Automated model building and model updating  
Patent #
US 8,271,103 B2
Filed 05/01/2008

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

Automated model building and batch model building for a manufacturing process, process monitoring, and fault detection  
Patent #
US 8,494,798 B2
Filed 09/02/2008

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

Controlling a discretetype manufacturing process with a multivariate model  
Patent #
US 8,855,804 B2
Filed 11/16/2010

Current Assignee
MKS Instruments Incorporated

Sponsoring Entity
MKS Instruments Incorporated

Controlling a manufacturing process with a multivariate model  
Patent #
US 9,069,345 B2
Filed 01/23/2009

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

Multivariate monitoring of a batch manufacturing process  
Patent #
US 9,429,939 B2
Filed 04/06/2012

Current Assignee
MKS Instruments AB

Sponsoring Entity
MKS Instruments Incorporated

Multivariate prediction of a batch manufacturing process  
Patent #
US 9,541,471 B2
Filed 01/20/2014

Current Assignee
MKS Instruments AB

Sponsoring Entity
Sartorius Stedim Data Analytics AB

Method of endpoint detection of plasma etching process using multivariate analysis  
Patent #
US 10,002,804 B2
Filed 02/25/2016

Current Assignee
Tokyo Electron Limited

Sponsoring Entity
Tokyo Electron Limited

Compositional optical emission spectroscopy for detection of particle induced arcs in a fabrication process  
Patent #
US 10,436,717 B2
Filed 11/17/2017

Current Assignee
Tokyo Electron Limited

Sponsoring Entity
Tokyo Electron Limited

Surface modification control for etch metric enhancement  
Patent #
US 10,446,453 B2
Filed 03/13/2018

Current Assignee
Tokyo Electron Limited

Sponsoring Entity
Tokyo Electron Limited

Endpoint detection algorithm for atomic layer etching (ALE)  
Patent #
US 10,453,653 B2
Filed 03/08/2017

Current Assignee
Tokyo Electron Limited

Sponsoring Entity
Tokyo Electron Limited

Advanced optical sensor and method for detecting an optical event in a light emission signal in a plasma chamber  
Patent #
US 10,692,705 B2
Filed 11/15/2016

Current Assignee
Tokyo Electron Limited

Sponsoring Entity
Tokyo Electron Limited

Methodology for improved semiconductor process monitoring using optical emission spectroscopy  
Patent #
US 6,046,796 A
Filed 04/22/1998

Current Assignee
Advanced Micro Devices Inc.

Sponsoring Entity
Advanced Micro Devices Inc.

Method and system for using a weighted response  
Patent #
US 20030186461A1
Filed 03/29/2002

Current Assignee
Smiths Detection Incorporated

Sponsoring Entity
ENVIRONMENTAL TECHNOLOGIES GROUP INC.

Determining endpoint in etching processes using principal components analysis of optical emission spectra with thresholding  
Patent #
US 6,238,937 B1
Filed 01/26/2000

Current Assignee
GlobalFoundries Inc.

Sponsoring Entity
Advanced Micro Devices Inc.

Maintenance of process control by statistical analysis of product optical spectrum  
Patent #
US 5,862,060 A
Filed 11/22/1996

Current Assignee
UOP LLC

Sponsoring Entity
UOP LLC

Monitoring and controlling plasma processes via optical emission using principal component analysis  
Patent #
US 5,658,423 A
Filed 11/27/1995

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Endpoint detection  
Patent #
US 5,288,367 A
Filed 02/01/1993

Current Assignee
International Business Machines Corporation

Sponsoring Entity
International Business Machines Corporation

Multichannel plasma discharge endpoint detection method  
Patent #
US 5,160,402 A
Filed 05/24/1990

Current Assignee
Applied Materials Israel Limited

Sponsoring Entity
Applied Materials Israel Limited

29 Claims
 1. A method creating a predictive model of an event in an etch process using partial least squares discriminant analysis (PLSDA) comprising:
collecting calibration data for an etch process;
using at least a portion of the calibration data, identifying a feature associated with the event;
creating a predictor matrix, wherein the predictor matrix includes data for the feature associated with the event;
creating a response matrix, wherein the response matrix is comprised of a first discriminate variable value for the feature associated with the event; and
finding a predictive model for the event by regressing the response matrix and the predictor matrix using PLS regression.  View Dependent Claims (2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15)
 16. A method creating a predictive model of an endpoint of an etch process using partial least squares discriminant analysis (PLSDA) comprising:
collecting data samples from a calibration wafer using optical emission spectroscopy (OES), each OES data sample having intensity values for a plurality of simultaneously sampled discrete wavelengths;
plotting an endpoint signal from the OES data samples using a plotting method for identifying a location of an endpoint transition;
identifying an etch region of stable intensity values and a postetch region of stable intensity values on the endpoint signal;
creating a predictor matrix by arranging OES data samples associated with the etch region and the postetch region by one OES sample in each row;
creating a response matrix with a first discriminate variable value for the OES data samples associated with the etch region and a second discriminate variable value for the OES data samples associated with the postetch region;
finding a predictive model for the endpoint by regressing the response matrix and the predictor matrix using PLS regression; and
filtering realtime data from a production wafer using the predictive model.
 17. A method creating a predictive model of an endpoint of an etch process using partial least squares discriminant analysis (PLSDA) comprising:
collecting data samples from a calibration wafer using optical emission spectroscopy (OES), each OES data sample having intensity values for a plurality of simultaneously sampled discrete wavelengths;
finding a time derivative of the optical emission spectroscopy (OES) data samples;
plotting an endpoint signal from the OES data samples using a plotting method for identifying a location of an endpoint transition;
identifying an etch region of stable intensity values, a transition region of evolving intensity values and a postetch region of stable intensity values on the endpoint signal;
creating a predictor matrix by arranging OES sample derivative values associated with the etch region, the transition region and the postetch region by one OES sample derivative value in each row;
creating a response matrix with a first discriminate variable value for OES sample derivative values associated with the transition region and a second discriminate variable value for all other OES sample derivative values;
finding a predictive model for the endpoint by regressing the response matrix and the predictor matrix using PLS regression; and
filtering realtime data from a production wafer using the predictive model.
 18. A method creating a predictive model of an endpoint of an etch process using partial least squares discriminant analysis (PLSDA) comprising:
collecting data samples from a calibration wafer using optical emission spectroscopy (OES), each OES data sample having intensity values for a plurality of simultaneously sampled discrete wavelengths, wherein the OES data is synchronously sampled with a period of intensity data;
removing ambiguous and misleading OES data samples;
plotting an endpoint signal from the OES data samples using a plotting method for identifying a location of an endpoint transition;
identifying an etch region of stable intensity values and a postetch region of stable intensity values on the endpoint signal;
creating a predictor matrix by arranging OES data samples associated with the etch region and the postetch region by one period of OES samples in each row;
creating a response matrix with a first discriminate variable value for the OES data samples associated with the etch region and a second discriminate variable value for the OES data samples associated with the postetch region;
finding a predictive model for the endpoint by regressing the response matrix and the predictor matrix using PLS regression; and
filtering synchronously sampled realtime periodic data from a production wafer using the predictive model.
 19. A method creating a predictive model of an event in an etch process using partial least squares discriminant analysis (PLSDA) comprising:
collecting data samples from a calibration wafer, each data sample having a plurality simultaneously recorded readings, said event occurring in the data samples from a calibration wafer;
previewing an event plot from the data samples using previewing method for identifying a location of an event transition;
identifying the event transition region on the event plot;
creating a predictor matrix by arranging simultaneously recorded readings associated with the event transition region and simultaneously recorded readings not associated with the event transition region, by one simultaneously recorded reading in each row;
creating a response matrix with a first discriminate variable value for the simultaneously recorded readings associated with the event transition region and a second discriminate variable value simultaneously recorded readings not associated with the event transition region;
finding a predictive model for the event by regressing the response matrix and the predictor matrix using PLS regression; and
filtering realtime data from a production wafer using the predictive model.
 20. A method creating a predictive model of an event in a chamber cleaning process using partial least squares discriminant analysis (PLSDA) comprising:
collecting calibration data for a chamber cleaning process;
using at least a portion of the calibration data, identifying a feature associated with the event;
creating a predictor matrix, wherein the predictor matrix includes data for the feature associated with the event;
creating a response matrix, wherein the response matrix is comprised of a first discriminate variable value for the feature associated with the event; and
finding a predictive model for the event by regressing the response matrix and the predictor matrix using PLS regression.  View Dependent Claims (21, 22, 23, 24, 25, 26)
 27. A method creating a predictive model of a fault event in a semiconductor fabricating process using partial least squares discriminant analysis (PLSDA) comprising:
collecting calibration data for a semiconductor fabricating process;
using at least a portion of the calibration data, identifying a feature associated with the fault event;
creating a predictor matrix, wherein the predictor matrix includes data for the feature associated with the fault event;
creating a response matrix, wherein the response matrix is comprised of a first discriminate variable value for the feature associated with the fault event; and
finding a predictive model for the event by regressing the response matrix and the predictor matrix using PLS regression.  View Dependent Claims (28, 29)
1 Specification
1. Field of the Invention
The present invention relates to semiconductor processing, and more particularly, the present invention relates to the etch process by which semiconductor material is etched out leaving welldefined features. Still more particularly, the present invention relates to a system, method and software program product for accurately determining the changes in a signal that are indicative of an endpoint of the etching process.
2. Description of Related Art
There are many steps involved in the processing of a wafer, of which etching is one of the crucial steps. Etching is a process whereby a selected area on a wafer surface is removed so as to make a desired pattern on the surface. Plasma etching can accurately remove patterns of very small dimension on the surface of a semiconductor wafer. Reactive Ion Etching (RIE) is an etching technique in which radio frequency radiation in a low pressure gas ionizes the gas and dissociates the gas molecules into more reactive species.
FIG. 1 is a crosssectional illustration of an etcher in which the RIE process may be performed. FIG. 1 is a diagram of an exemplary etcher intended only to aid in describing etching principles useful in understanding the description of the present invention and not intended to faithfully represent any actual etcher. Etcher 100 is a plasma etch reactor in which the RIE process is confined within chamber 102. In operation, plasma 118 is produced in chamber 102 when etching gas 122 enters reaction chamber 102 and is ionized by the application of an electric field established between cathode 110 and anode 112. As etching gas 122 is ionized into plasma 118, the respective velocity of electrons and ions are significantly different due to the difference in their masses. A typical etcher 100 uses anode 112 at ground potential and cathode 110 connected to the RF generator 114 and biased above ground. Wafer 130 is placed on the platen. Gas molecules of etching gas 122 are accelerated to the substrate surface of wafer 130 toward cathode 110 due to the difference in potential across the electrodes. Wafer 130 is bombarded with reactive positive ions created in the plasma which causes atoms of the substrate to be sputtered and usually react chemically with etching gas 122, thereby removing the top layer of material on wafer 130. The newlyformed gases 126 are removed from chamber 102 by vacuum system 125 through exhaust port 124.
The etching step of wafer production is an integral part of semiconductor manufacturing; however, equally important to accurate etching is detecting the precise point in time the etching process has ended, i.e. the “endpoint.” The endpoint of the etching process is where all etched feature patterns are fully delineated and undercutting of the substrate is held to a minimum. Typically, endpoint detection mechanisms determine the endpoint of an etch process by distance determinations or by optical emission. A laser interferometer (not shown in the figure) reflects laser radiation off wafer 130 during processing and the thickness of the etched layer is determined by the interference of the reflected light. Alternatively, the optical emissions of the reaction products of the etching process are used to determine an endpoint.
Emission collection and processing mechanism 150 receives and preprocesses light emitted by the plasma into a form usable by an endpoint detection mechanism (not shown). Initially, collimating and focusing optics 152 receive light 154 from inside chamber 102 and transmit it, usually through an optical fiber, to optical emission analyzer 156. One type of optical emission analyzer 156, a monochromator, monitors the intensity of a single wavelength of light 154 from the exhaust gases and outputs signal 158 based on the intensity of the light at the wavelength being monitored. Generally, it is expected that the intensity of the light at this wavelength will change at the endpoint of the etching process, and thus the output signal 158 indicates that transition.
Accurately detecting the endpoint of an etch process by analyzing the optical emission of the reaction products depends on identifying an optimal discrete wavelength, usually associated with a reactant species, that exhibits a quantifiable intensity change of the light associated with the endpoint transition. Although the intensity of light from many reactant species decreases at the etching process endpoint, the light from some other species increases in intensity. Many factors have a detrimental effect on signal detection that must be compensated for, e.g. low amplitude of transmitted light energy 154 due to a dirty view port, spurious noise from plasma fluctuations that masks the endpoint signal, unsteady electrical fields from the electrodes, electronics malfunctions, or inaccurate optical measurements.
As fine line patterning becomes more prevalent and the percentage of area to be etched becomes smaller, accurate endpoint detection becomes more difficult. As feature sizes decrease, the percent of the wafer open area also decreases, requiring plasma etch endpoint detection systems to be more sensitive and accurate. Traditional endpoint detection systems that monitor one or two wavelengths do not generate enough information for successful endpoint determination when open areas drop below a nominal surface percentage.
In another alternative, collimating and focusing optics 152 receive light from reactant species inside chamber 102 and feeds the entire spectrum of light over a substantially broad range of wavelengths 154 to optical emission analyzer 156. Optical emission analyzer 156 is a spectrometer which is capable of monitoring multiple discrete wavelengths. Using processing functionality in optical emission analyzer 156, the operator can then select the most appropriate sets of wavelengths for reactant species associated with a particular etching process and generate output 158 from the intensities of the selected wavelengths. Monitoring intensities of multiple wavelengths gives an operator increased flexibility. The disadvantage is that the complexity increases for endpoint detection.
FIG. 1 further depicts etcher 100 as having magnets 142 on rotating magnet housing 140 for magnetically enhanced RIE etching. Magnets 142 revolve around chamber 102 causing plasma 118 to follow the magnetic field associated with magnets 142. This produces a more homogenous etch of the surface. However, magnets 142 induce one or more bright spots in plasma 118 that revolve with magnet housing 140. Optical emission analysis of the etch process becomes more challenging because the intensity of the light 154 now is periodic with the rotation of a mechanical device and is not based solely on the reactant processes. Here it is necessary for the operator to have expert knowledge of plasma chemistry and/or spectroscopy. The operator must also understand how the rotating magnets degrade the endpoint signal. Understanding how to do this optical emission analysis to find the endpoint using multiple sets of wavelength from the output of a spectrometer has been heretofore unknown.
The present invention is directed to a system, method and software product for creating a predictive model of the endpoint of etch processes using Partial Least Squares Discriminant Analysis (PLSDA). Initially, intensity readings for discrete wavelengths in a spectrum are collected from a calibration wafer using optical emission spectroscopy (OES). Intensity values in the OES data may represent a signal that is nonperiodic or periodic with time. Periodic signals may be sampled synchronously or nonsynchronously with the period of a signal. Initially, the OES data is arranged in a spectra matrix X having one row for each data sample.
The OES data is processed to remove transients that occur during the startup and shutdown of the etch process. Wavelength regions are selected with desirable endpoint transition qualities, such as sharpness, and wavelengths with saturated intensity values are wholly removed from the processed OES data.
A preview endpoint signal is plotted using the selected wavelength regions and/or PCA analysis on the spectra matrix X. Regions of stable intensity values on the endpoint plot, that are associated with either the etch region or the postetch region, are identified by sample number. An Xblock is created from the processed OES data samples associated with the two regions of stable intensity values. Nonperiodic OES data and asynchronously sampled periodic OES data are arranged in an Xblock by one sample per row. Synchronously sampled periodic OES data are arranged in the Xblock by one period per row. A yblock is created for classifying the features of the etch process by using binary partitioning; here, these features are the regions of stable intensity values associated with the etch. The yblock is created by assigning a discriminate variable value of “1” to OES samples in the class of the etch region and assigning a discriminate value of “0” to samples not in that class. A bvector is obtained by regression from the X and yblocks using PLS and is validated using the processed OES data in spectra matrix X. Various other vectors are obtained from the validated bvector and are used with the appropriate algorithm to process realtime OES data from a production etch process to detect endpoint.
The novel features believed characteristic of the present invention are set forth in the appended claims. The invention itself, however, as well as a preferred mode of use, further objectives and advantages thereof, will be best understood by reference to the following detailed description of an illustrative embodiment when read in conjunction with the accompanying drawings wherein:
FIG. 1 is a crosssectional diagram of an etcher in which the RIE process may be performed;
FIG. 2 is a diagram of a plasma etcher configured for data collection in accordance with an exemplary embodiment of the present invention.
FIG. 3 is an illustration of an exemplary spectrum of intensity values taken at a sample time T_{i}, as might be measured by one of spectrographs 156A or 156B during an etch process;
FIG. 4 illustrates the signal intensity of an exemplary 325 nm wavelength over a time interval corresponding to sampling times for samples 01300;
FIG. 5 is an illustration of the intensity signal shown in FIG. 4 over the identical time interval for the 325 nm wavelength with a vertical scale magnified to 0.01720.0191 arbitrary units;
FIG. 6 is another illustration of the intensity signal shown in FIG. 4 for the 325 nm wavelength displayed with approximately the same intensity scale as in FIG. 5 but with a time interval of less than one hundred samples;
FIG. 7 is an illustration that compares the spectral intensity values of sample 409 and sample 419;
FIG. 8 is a plot of an exemplary endpoint signal using wavelength regions with desirable endpoint transition qualities and data from a contact etch of a production wafer with 4% open area;
FIG. 9 is a plot of the derivative of the endpoint signal depicted in FIG. 8;
FIG. 10 is a diagram illustrating the dimensions of class matrix Y, spectra matrix X and regression matrix B;
FIG. 11A depicts each row of a folded matrix X with W columns and M rows;
FIG. 11B depicts a folded spectra matrix X, wherein each row contains intensity values I_{ij }for a unique spectra s_{t};
FIG. 11C depicts a folded matrix X for a more general case of measurements, wherein each row contains readings G_{ij }for generalized spectra s_{i};
FIG. 12A depicts data samples s_{1 }. . . s_{M }of measurement M in a folded matrix X;
FIG. 12B depicts unfolded matrix X, wherein each row contains all R number of spectra that were taken in a unique period of the signal;
FIG. 12C depicts matrix X as unfolded by period, wherein each row contains all intensity values I_{ij }taken in each spectra s_{t }for a unique period p_{k};
FIG. 13A is an illustration of endpoint signal 1300 for an etch, where the stable intensity values for the etch process can be clearly distinguished from other stable intensity values not in the class of stable values;
FIG. 13B shows a oneclass response vector y having values that are selected for one class of high intensity values;
FIG. 13C shows a Cclass response matrix Y having values that are selected for C classes of high intensity values;
FIG. 14A is an illustration of the derivative 1400 of the data for endpoint signal 1300;
FIG. 14B shows a oneclass response vector y having values that are selected for one class of derivative values;
FIG. 14C shows a Cclass response matrix Y for derivative values;
FIG. 15A is a diagram of exemplary bvector 1500 having indices for wavelengths from 200 nm to 700 nm;
FIG. 15B is a diagram illustrating the b_{hi }vector 1510 consisting of the components of b in the intervals Δλ_{2}, Δλ_{3 }and Δλ_{6 }that are above some cutoff limit, N_{hi}, and the b_{lo }vector 1512 consisting of the components of b in the intervals Δλ_{1}, Δλ_{4 }and Δλ_{5 }that are below some cutoff limit, N_{lo }of bvector 1500;
FIG. 15C is a diagram illustrating the b_{hipeak }vector 1520 consisting of the components of b in the intervals Δλ_{2}, Δλ_{3 }and Δλ_{5 }that are manually selected from positive peaks of bvector 1500 and the b_{lopeak }vector 1522 consisting of the components of b in the intervals Δλ_{1 }and Δλ_{4 }that are manually selected from negative peaks of bvector 1500;
FIG. 15D is a diagram illustrating bvector 1560 which is derived from synchronously sampled periodic OES data in accordance with an exemplary embodiment of the present invention, where R OES spectra (R=28) are taken each period and each of the R spectra correspond to a magnet position on the housing on a magnetically enhanced etcher;
FIG. 16 is a highlevel flowchart depicting a process that uses partial least squares discriminant analysis (PLSDA) to create a predictive model of an etch process and to determine the endpoint of the process in accordance with an exemplary embodiment of the present invention;
FIG. 17 is a detailed flowchart depicting a process for processing the OES data from the calibration wafer in accordance with an exemplary embodiment of the present invention;
FIG. 18 is a detailed flowchart depicting a process for creating the Xblock and yblock and using PLS regression to find a bvector in accordance with an exemplary embodiment of the present invention; and
FIG. 19 is a flowchart depicting an expanded process for processing synchronously detected, periodic OES data from the calibration wafer in accordance with an exemplary embodiment of the present invention.
Other features of the present invention will be apparent from the accompanying drawings and from the following detailed description.
FIG. 2 is a diagram of an exemplary multichamber etcher and is intended only to aid in describing etching principles useful in understanding the description of the present invention. Although multichamber etcher 200 is not intended to accurately depict an actual etcher, multichamber etchers, such as the P5000 Etcher available from Applied Materials in Santa Clara, Calif., are well known in the art. Multichamber etcher 200, as depicted, has two exemplary etch chambers, 100A and 100B. Elements associated with a particular etch chamber are identifiable by the character designation of A and B, respectively. Etch chamber 100A is RIE, while etch chamber 100B supports magnetically enhanced RIE using rotating magnets 140B. Each of chambers 100A100B have cathodes 110A110B, electrically connected to RF generators 114A114B by way of blocking capacitors. Light 154A154B is captured in each of etch chambers 100A100B and transmitted to spectrographs 156A156B. The relative position of rotation of rotating magnets 140B associated with etch chamber 100B is sensed using Halleffect sensor 247B from changes in the magnet flux associated with the magnets mounted thereon. Halleffect sensor 247B is in turn electrically coupled to rotation sensor control 248B for decoding the signals from sensor 247B. Alternatively, the orientation of rotating magnets 140B may be tracked directly by any means that measures the magnets position using magnet rotation sensor 249B, coupled likewise to rotation sensor control 248B. Rotation sensor control 248B provides orientation information to spectrograph 156B, which also receives spectra light intensities 154B from etch chamber 100B. Spectrograph 156B uses the orientation information to synchronize the data samples of the light intensity with the orientation of rotating magnets 140B.
Synchronous detection with CCD detectors in spectrograph 156B has been demonstrated to improve the signaltonoise ratio as discussed in “System to phase lock a selfscanning photodiode array to an external signal” by Harvey, et al., Rev. Sci. Instrum. 63 (3), March 1992, pp. 19911998, incorporated by reference herein in its entirety. The light intensity 154B is sampled at discrete wavelengths by spectrograph 156B and passed to data processor 260B as intensity values for each discrete wavelength. An exemplary spectrograph may be any type having a highgrade sensor array and being capable of interfacing with a data processor such as the SD1024 Smart Detector Spectrograph available from Verity Instruments, Inc. in Carrollton, Tex.
In operation, both RIE etch chambers 100A100B are loaded with wafers and evacuated. Rotating magnets 140B are set in motion for chamber 100B. Etch gases are put into chambers 100A100B and etching pressure is established. Spectrographs 156A156B sample optical emission spectra by optical emission spectroscopy (OES). During calibration, which we discuss below, spectrographs 156A156B send OES data to data processors 260A260B for storage. During production, spectrographs 156A156B may send OES data to data processors 260A260B for storage. During production, data processors may send in real time endpoint data to chamber controllers 262A262B for endpoint control of the etch process.
FIG. 3 is an illustration of an exemplary spectrum of intensity values as might be measured by one of spectrographs 156A or 156B during an etch process. FIG. 4 illustrates for magnetically enhanced RIE the signal intensity at an exemplary 325 nm wavelength over a time interval corresponding to sampling times for samples 01300. Variations in signal intensity can be clearly seen in FIG. 4. FIG. 5 is an illustration of the intensity signal shown in FIG. 4 over the identical time interval for the 325 nm wavelength; however, the intensity signal in FIG. 5 is magnified by scaling the intensity between 0.0170 and 0.0191 arbitrary signal units. The variations in signal intensity seen in FIG. 4 are more apparent in FIG. 5. FIG. 6 is another illustration for the 325 nm wavelength displayed at approximately the same intensity scale as shown FIG. 5, but only sample numbers between 390 and 465 are depicted in FIG. 6.
FIG. 7 is an illustration of a comparison of the spectra intensity values of the 409^{th }sample and 419^{th }sample. Note that a change in the intensity values of some wavelengths in the spectra has occurred between the 409^{th }and 419^{th }sample times. However, it cannot be known whether the change in intensity values between the two spectra samples is a result of emissions from an etch process, periodic noise caused by the rotating magnets or a combination of the two. It should also be apparent that the change in intensity values between the 409^{th }and 419^{th }sample is greater in some wavelength regions of the spectra than in others. Endpoint transitions are more difficult to detect in wavelength regions that lack desirable endpoint transition properties, such as sharpness of the transition. Moreover, even if wavelength regions are selected with desirable endpoint transition properties, the transition might be masked by the periodic noise during the endpoint transition. The challenge is to select wavelength regions with desirable endpoint transition properties from the spectra, even if the regions are not contiguous, and diminish the effect of periodic noise on endpoint detection. FIG. 8 is a diagram of a plot of an exemplary endpoint signal using wavelength regions with desirable endpoint transition properties. The endpoint transition region can clearly be seen occurring between 70 and 90 seconds. FIG. 9 is a diagram of the derivative of the endpoint signal depicted in FIG. 8 showing the region of change associated with the endpoint at approximately the same time interval.
The exemplary embodiments described below were selected and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated. The particular embodiments described below are in no way intended to limit the scope of the present invention as it may be practiced in a variety of variations and environments without departing from the scope and intent of the invention. Thus, the present invention is not intended to be limited to the embodiment shown, but is to be accorded the widest scope consistent with the principles and features described herein.
The present invention is directed to a method for creating a predictive model of the endpoint of etch processes using Partial Least Squares Discriminant Analysis (PLSDA). PLSDA is a technique of statistical pattern recognition in which samples are classified. The present process involves the steps of collecting calibration data, creating a predictive model using the PLSDA algorithm and applying the predictive model to data from an etch process for realtime endpoint detection. The method uses the calibration data, which are intensity measurements from an etch process on a calibration wafer, to identify the appropriate wavelengths for calculating an endpoint signal for a particular etch process. A suitable calibration set must contain an identifiable endpoint for the etch process. A training data set is identified in the calibration data set by dividing the calibration data samples into classes based on their association with endpoint features. The training data set is then arranged in a matrix and is the predictor block in the PLS calculation of a predictive model. The response matrix is created by assigning binary variable values to represent the classes. The predictive model can be verified by generating an endpoint using the full calibration data set and comparing the predicted endpoint with the known endpoint. Finally, the predictive model can make realtime endpoint predictions for etch processes that are similar to those for the calibration wafer.
The present process for creating a predictive model using PLSDA is adaptable. Realtime endpoint detection can be done for a variety of data and data collection conditions by modifying the calibration process. Signals measured during an etch process may be periodic or nonperiodic. Periodic signals have intensity values that repeat during some regular time interval, T_{p}, while nonperiodic signals do not repeat during any regular time interval. The time interval, T_{p}, is the period of the signal. For the purposes of discussing the present invention, it will be assumed that the intensity of the periodic signal results from two things, an etch reaction and the carrier gas. The source or nature of the periodic behavior is relatively unimportant for application of the present invention. However, it is assumed that the period of the intensity remains constant.
Periodic signals may be detected either synchronously or nonsynchronously. Synchronous detection may be done with the signal itself in a phaselock technique or with a timing signal has the period of the signal. Since the frequency and amplitude of the synchronously detected periodic component are both stable, the portion of the intensity values attributable to the periodic component are static for each similarly positioned sample of every sample period. Therefore, nonsynchronous detection of a periodic signal has intrinsically more noise than synchronous detection. The present PLSDA method will be discussed below with regard to periodic signals, sampled synchronously or asynchronously, and nonperiodic signals.
The PLSDA algorithm incorporates both Partial Least Squares regression and Discriminant Analysis. Partial Least Squares (PLS) is a projection method for the calculation of regression models for multivariate data. This method is sometimes called Projections to Latent Structures. Discriminant Analysis divides samples into classes based on certain features of the samples and associates discriminant variables with the classes in a Yblock. For endpoint detection, PLSDA is an improvement over existing methods that use Principal Component Analysis (PCA). In the present application, the classes of interest are regions of stable intensity values associated with different regions of etch. However, transition regions with a corresponding class variable appropriately set between the extremes could be used.
In PCA, the variation of the data is described as a few principal components, or latent variables, that are orthogonal to each other. This can reduce the dimensionality of the data. PCA can follow process dynamics or detect faults. In PCA, the latent variables of greatest variance are found. However, when the noise sources are large or the signal is small, the latent variables of greatest variance may be noise sources with large variance. With regard to an etch process, the noise can obscure the desired endpoint signal. It can sometimes find a transition in the signal that is not the true endpoint because it looks for any type of change, not just changes associated with the endpoint. This results in ambiguous interpretations of endpoint changes or slow endpoint transitions. Thus, PCA is often unsuitable for endpoint detection.
In the present description of an exemplary embodiment, the following conventions for notation are observed. Normal, nonbold characters in italics, including upper case characters, are scalars. They can refer to single data elements, or can have a special meaning as is explained in the text. Examples are vector elements, e.g. v_{i}, matrix elements, e.g. x_{ij}, and indices, e.g., i,j and k. Capital, nonbold characters are index limits, e.g., K and N. Bold, lower case characters are vectors, e.g. x and y. The elements of x may be x_{i}, i=1,2, . . . , N. Bold, upper case characters are twoway arrays or matrices, e.g. X and Y. Vectors may be expressed in terms of scalars. A 1×N row vector u, which is composed of scalars u_{t}, is written u=[u_{1}, u_{2}, . . . , u_{N}]. A N×1 column vector v, which is composed of scalars ν_{t}, is written ν=[ν_{1}; ν_{2}; . . . ; ν_{N}]. In other words, objects in a row are separated by commas, and objects in a column are separated by semicolons. Matrices may be expressed in terms of scalars, vectors, or other matrices. A N×K matrix M_{1}, which is composed of 1×K row vectors u_{t}, is written as M_{1}=[u_{1}; u_{2}; . . . ; u_{N}]. A K×N matrix M_{2}, which is composed of K×1 column vectors v_{t}, is written as M_{2}=[v_{1}, v_{2}, . . . , v_{N}]. A 2×12 matrix M_{3}, which is composed of 2×3 matrices P_{t}, is written as M_{3}=[P_{1}, P_{2}, P_{3}, P_{4}]. A 8×3 matrix M_{4}, which is composed of 2×3 matrices P_{t}, is written as M_{4}=[P_{1}; P_{2}; P_{3}; P_{4}].
PLS was developed by H. Wold in Quantitative Sociology, ed. by H. M. Blalock, et al. Academic Press Inc., New York, 307357(1975) and extended to chemistry by S. Wold, et. al., in Matrix Pencils, ed. by A. Ruhe, et al. SpringerVerlag, Heidelberg, 286293 (1983), which are incorporated herein by reference in their entirety. It is an extension of PCA for regression analysis. The objectives of PLS are to find how a predictor matrix X is related to a response matrix Y by a regression matrix B and to create a predictive model:
PLS finds the latent structure in the predictor matrix X and in the responses Y. This is a dimensionality reduction of the data matrix X. It maximizes the covariance between the variances of X and the variances of Y. PLS has an advantage over PCA in that Y may be specified so as to more physically describe the data. A numerical algorithm for PLS is SIMPLS, which was developed by S. de Jong in SIMPLS: An Alternative Approach To Partial Least Squares Regression, Chemometrics and Intelligent Laboratory Systems, 18 (1993) 251263 which is incorporated herein by reference in its entirety. Instances of Y are predicted from the product of measured vectors in X with the regression matrix B:
In some cases, Multiple Linear Regression (MLR) could estimate B. However, there is a significant amount of correlation in X for OES data which can result in a system that is rank deficient or nearly rank deficient. The result is a poor or unstable estimate of B. Also, if there are more variables than samples, which is typical for OES data, then the matrix X is rank deficient and MLR is not an option. Therefore, PLS is superior because it can handle rank deficient systems.
Discriminant Analysis (DA) divides samples into classes based on certain features of the samples and associates variables with the classes. The purpose of creating classification or pattern recognition models is to detect these features in measurement samples and assign the samples to the correct classes.
PLSDA is a supervised classification method that uses regression for multivariate problems. PLSDA was first described by M. Sjostrom, et al., in Proceedings of PARC in Practice, Amsterdam, June 1921, Elsevier Science Publishers B.V., NorthHolland (1986) and PLS discrimination plots, in Pattern Recognition in Practice II, Elsevier., Amsterdam, 1986, pp. 461470, and by L. Stahle, et al. in Journal of Chemometrics 1, 185196 (1987), which are incorporated herein by reference in their entirety. PLSDA is a synthesis of PLS and DA. An inverse least squares model can be used for oneclass data
where X is the spectra matrix of the calibration data and y is the class vector which describes the class membership of the observations. The regression vector b is the direction in X that is parallel to y and orthogonal to other types of variations. This is important since there can be a significant amount of variance in X that is not associated with endpoint. Therefore, the regression analysis identifies wavelengths that directly relate to the endpoint transition.
Most embodiments described below will be cases involving regression analysis which identify wavelengths that contribute to a single endpoint. Therefore, the measurement samples will contain only a single feature resulting in a single class, and so b and y will be vectors. However, the present PLSDA algorithm is exceptionally flexible allowing for the creation of predictive models for accurately detecting C endpoints, where C>1. In those cases, the measurement samples will contain C features resulting in C classes and the regression analysis identifies wavelengths that directly relate to C endpoints. Thus, in general, the matrix X is a N×V matrix, the regression matrix B is a V×C matrix and the class matrix Y is a N×C matrix. The dimensions of class matrix Y, regression matrix B and spectra matrix X in equation (2) are pictorially represented in FIG. 10. Also, if there are C classes, one may create a separate singleclass model for the each of the C classes. This will yield C different bvectors b_{i}, i=1, . . . , C.
PLSDA finds the direction in X space that best describes the difference between the classes. The N×V matrix X is the predictor block in the calculations of a calibration model for a known time variation Y using PLS. After collecting the calibration data, such as OES data, the intensities of the spectra that are measured are placed in the spectra matrix, X. These intensities are the descriptors of the system.
The measurements of the spectra are done at M equally spaced times, T_{t}, with indices, i=1, . . . , M. The time T_{s }between measurements is called the sample interval. It is assumed that the time required for making the measurements of the intensities at the various wavelengths in a spectrum, or the integration time T_{σ}, is small compared to T_{s}. Thus, T_{s}>>T_{σ}. However, the present process will perform equally well in cases where the integration time T_{σ} is not small compared to the sample interval T_{s}, i.e. T_{s}□T_{σ}. For a measurement M consisting of M spectra s_{t}, M=[s_{1}, s_{2}; . . . ; s_{M}]. FIG. 11A depicts such a M×W matrix where each row holds the data from one sample time. The spectra s_{i }has intensities I_{ij}, such that s_{i}=[I_{i1}, I_{i2}, I_{i3}, . . . I_{iW}]. Each intensity readings I_{ij }corresponds to a discrete wavelength λ_{j}, j=1,2, . . . , W.
Depending upon whether the data is periodic or nonperiodic, and whether the periodic data is detected synchronously or asynchronously, the data in measurement M are arranged in the matrix X in one of two ways in preparation for PLS regression.
In the first way, each row of the matrix X holds a single spectrum taken at a unique sample time. Matrix X has M rows and W columns. This matrix is described as folded and will be referred to as such hereinafter. FIG. 11B depicts a spectra matrix X as folded, wherein each row contains one spectrum s_{i}, which consists of a plurality of intensity values I_{i1}, I_{i2}, . . . I_{iW }and the W intensity readings in spectrum s_{i }are taken at discrete wavelengths λ_{1}, λ_{2}, . . . , λ_{W}.
Creating a predictive model for endpoint detection, or a predictive model of any other type of event detection, need not be based solely on OES intensity data and may consist of readings other than spectra intensity values. Matrix X may be created from any type of measurement where the measurement M comprises a plurality of measurements when the readings G_{ij }are taken simultaneously, where s_{t}=[G_{i1}, G_{i2}, . . . G_{iK}]. Of course, a valid predictive model requires that the signal obtained from calibration data have some feature that correlates to the event to be detected using the predictive model. FIG. 11C depicts matrix X as folded for this more general case.
As should be apparent from the discussion of the folded matrix X, each row of the folded matrix X consists of all simultaneously recorded sample readings taken at time T_{t}. Therefore, the training data may be arranged in a folded matrix X whether or not the calibration data is periodic. However, as will be discussed below, noise must be filtered from a periodic signal prior to creating and regressing the folded matrix X.
In the second way, a superior predictive model may be created from nonfiltered, synchronously sampled periodic data by arranging the training data differently in the matrix X such that each of the rows holds data taken at a unique period. The data should be sampled synchronously with the period of the signal. In the measurement M, the spectra s, may be grouped by periods P_{j}, see FIG. 12A. For the first period, P_{1}=[s_{1}; s_{2}; . . . ; s_{R}] where there are R spectra in a period. In general, P_{i}=[s_{(i−1)R+1}; s_{(i−1)R+2}; . . . ; s_{iR}]. In this way, M=[P_{1}; P_{2}; . . . ; P_{P}] where P is the number of periods in the measurement. Each P_{i }may be rearranged into a row vector p_{i }so that p_{1}=[s_{1}, s_{2}, . . . , s_{R}] and p_{t}=[s_{(t−1)R+1}, S_{(i−1)R+2}, . . . , s_{tR}]. Now the X matrix is rewritten, in the unfolded form, as X=[p_{1}; p_{2}; . . . ; p_{P}]. See FIG. 12B. In this arrangement, each row is a series of R spectra for one period of the signal. Each column in matrix X now has the samples that were measured at the same position in the period of the signal. Matrix X has P number of rows and U number of columns, where U=W·R. This matrix is described as unfolded and will be referred to as such hereinafter. Although not depicted in the figures, in most cases, the number of variables U is usually greater than the number P. FIG. 12C is an illustration of the unfolded matrix X which explicitly shows the intensity values I_{ij}.
Recall also that matrix X may be created from any type of measurement where the samples s, of measurement M comprise a plurality of measurements when the readings, G_{tj}, are taken simultaneously, where s_{t}={G_{t1}, G_{2i2}, . . . G_{iK}}. If readings s_{t }are sampled synchronously, then the reading may be arranged such that matrix X is unfolded. Thus, superior results in identifying appropriate measurements for event detection using the PLSDA algorithm of the present invention may be achieved by organizing the reading data in an unfolded matrix X in an identical manner as that described above for intensity values. Of course, a valid predictive model requires that the signal obtained from calibration data have some feature that correlates to the event to be detected using the predictive model.
In either arrangement, the measured spectra have a number of highly correlated measurements that describe the behavior of the process. The challenge is dealing with this large number of correlated variables. As mentioned above, the PCA algorithm is unsuitable for this task when the noise sources are large or the signal is small, which is often the case.
Discriminant Analysis is employed to divide samples into classes based on features of the samples and associates variables with the classes. Generally, the class matrix Y is composed of N rows of Cdimensional vectors, where C is the number of classes and N is the number of time samples in the problem (see FIG. 10). For a sample, or observation, there are several ways to assign discriminant variables to the classes. In the one way, the discriminant variable of the first class has the value of “1” if the sample is in the first class and is “0” if the sample is not in the first class. The discriminant variable of the second class has a value of “1” if the sample is in the second class, and “0” if it is not in the second class and so forth through the C^{th }class. Thus, the k^{th }component of the vector is 1 if the sample belongs to the k^{th }class and 0 if it does not belongs to the k^{th }class. In this case, the cutoff limit is 0.5. In the another way, the kth component of the vector is “1” if the sample belongs to the class k and “−1” otherwise. The cutoff value is “0” here. In another way, the discriminant variables may be given values so that the column vectors of matrix Y have a mean of zero.
The assignment of discriminant variables in the class matrix Y is better understood with reference to FIGS. 13AC and FIGS. 14AC. FIG. 13A is an illustration of endpoint signal 1300 for a wavelength for an etch where the stable intensity values for the etch process can be clearly distinguished from other stable intensity values not in the class of stable values, i.e. etch process intensity values. Samples for the depicted wavelength show higher intensity values in the etch region (samples N_{1}N_{2}) and lower intensity values in the postetch region (samples N_{3}N_{4}). Other wavelengths may not show the exact character of signal 1300, but might instead show lower intensity values in the etch region and higher intensity values in the postetch region. FIG. 14A is an illustration of derivative 1400 of the data for endpoint signal 1300. This shows low derivative values in the transition between the etch and postetch regions (samples N_{2}N_{3}) and higher derivative values in both the etch region (samples N_{1}N_{2}) and postetch region (samples N_{3}N_{4}). A oneclass matrix Y could be constructed by dividing the derivative data into classes based on the relative value of the derivative data in either case.
FIG. 13B shows one class of Y values that are selected for one class of high intensity values. With regard to signal 1300, this class represents the etch region. FIG. 14B shows one class of Y values that are selected for one class of low derivative values. With regard to derivative 1400, this class represents the transition region. The response matrix Y may be extended for C classes as illustrated in FIGS. 13C and 14C by adding column vectors for the additional classes. These new classes represent additional events, such as etch endpoints of other layers.
In any of the cases described above, when a predictor matrix X is created from the calibration data and a class matrix Y is constructed for the features in the samples, a regression matrix B is found by using PLS regression on the X and V matrices. In most cases that we observed, one latent variable is sufficient for the PLS regression to give good results. However, more than one latent variable can be used. The regression matrix B is then validated by comparing the resultant endpoint signal, that was produced from the inner product of a test matrix X and the regression matrix B, to a endpoint signal that is known to be appropriate for identifying the endpoint for the particular etch process. In practice, the entire calibration data set may be used as the test matrix X.
The ultimate goal is to obtain a stable model that gives reliable endpoint predictions. For oneclass data, the response vector y is the endpoint signal. The vector x_{0 }may be a single measurement of a spectrum or it may be the unfolded data for a single magnet rotation. According to S. de Jong in SIMPLS, the usual expression for predicting new observations of the endpoint signal y_{0 }is:
This is known as the inner product, or simply the IP method, for determining an endpoint signal y_{0}. The regression vector b was produced from calibration data for an etch process identical to that which generated the data in vector x_{0}. Therefore, the endpoint signal y_{0 }corresponds to the endpoint signal of the calibration data. A bvector 1500 resulting from the regression of a predictor spectra matrix X and response yblock is illustrated in FIG. 15A. The bvector 1500 has indices for wavelengths from 200 nm to 700 nm.
For synchronous signals, the bvector obtained from calibration can be used to calculate an endpoint signal for each period. However, the endpoint signal at each sample time can also be calculated. The sampled data in X is collected for one period of data starting with the last sample and going back for one period. The bvector is then changed by moving it'"'"'s elements to correspond to the correct phase of the data samples in the xvector. The phase of the bvector must be the same as the samples in xvector. This is done in the code for the endpoint algorithms that follow. An analogous calculation is done when B and X are matrices.
The following is code for implementing the IP method described by Equation (4) above for endpoint calculations.
For simple cases such as FIG. 13A and class vectors such as FIG. 13B, we interpret the positive values of b to correspond to wavelengths of intensites that decrease at endpoint and the negative values of b to correspond to wavelengths of intensities that increase at endpoint. The interpretation of b should be done with caution, and it is not necessary to have an explicit interpretation to use the method. A second expression for the endpoint signal y_{0 }is,
where iihi is a vector of the indices of the bvector where the components of b are above some cutoff limit, N_{hi}, and iilo is a vector the indices where the components of b are below a second cutoff limit, N_{lo}. This is the sum of the inner products ratio method, or the IPRatio method, for determining an endpoint signal y_{0}. There is noise in any measurement system, so the regression vector b is subject to error. The cutoff limits eliminate some of this noise. Wavelengths where the components of b are greater than the positive cutoff limit N_{hi }are kept and all others are set to zero in the b_{hi }vector. Further, those wavelengths that are less than the negative cutoff N_{lo }are similarly kept in the b_{lo }vector. Also, by taking the ratio of the inner products, systematic drift is compensated to some extent and the signal is increased at endpoint since the numerator is increasing and the denominator is decreasing. FIG. 15B is a diagram illustrating the b_{hi }vector 1510 consisting of the components of bvector 1500, depicted in FIG. 15A, for wavelengths Δλ_{2}, Δλ_{3 }and Δλ_{6 }that are above some cutoff limit, N_{hi}, and the b_{lo }vector 1512 consisting of the components of bvector 1500 for wavelengths Δλ_{1}, Δλ_{4 }and Δλ_{5 }that are below some cutoff limit, N_{lo}. This second expression is not PLSDA but is directed by the results of PLSDA.
Below is code for implementing the IP Ratio method described by Equation (5) above for endpoint calculations.
A third method for determining an endpoint signal, the ratio of sums of the intensities or the Ratio method, is calculated as the ratio of the sum of intensities that increase to the sum of intensities that decrease:
where iihi is a vector of the indices of the bvector when the components of b are above some cutoff limit, N_{hi}, and iilo is a vector of the indices of the bvector where the components of b are below a second cutoff limit, N_{lo}. This third method is not PLSDA but is directed by the results of PLSDA.
Below is code for implementing the Ratio method described by Equation (6) above for endpoint calculations
A final method for determining an endpoint signal, the ratio of manually identified sums of the peak intensities or PeakRatio method, is calculated as the ratio of the sum of peak intensities that increase to the sum of peak intensities that decrease:
where iipeakhi is a vector of the manually selected indices of the components of the bvector which form some peak in the positive direction, those components of b are put in the b_{hipeak }vector. Where iipeaklo is a vector of the manually selected indices of the components of the bvector which form some peak in the negative direction, those components of b are put in the b_{lopeak }vector. The PeakRatio method is an extension of the Ratio method, but because the PeakRatio method uses parts of the bvector without regard to the magnitude of those parts, the cutoff limits vary from part to part, and therefore they must be manually identified on the bvector. FIG. 15C is a diagram illustrating the b_{hipeak }vector 1520 consisting of the components of b with the indices for wavelengths Δλ_{2}, Δλ_{3 }and Δλ_{5 }that are manually selected positive parts. The b_{lopeak }vector 1522 consists of the components of b with the indices for wavelengths Δλ_{1 }and Δλ_{4 }that are manually selected negative parts of bvector 1500 depicted in FIG. 15A. This last method is not PLSDA but is directed by the results of PLSDA.
FIG. 16 is a highlevel flowchart depicting a process for using partial least squares discriminant analysis (PLSDA) for creating a predictive model of an etch process and determining the endpoint of the process in accordance with an exemplary embodiment of the present invention. The process depicted in FIG. 16 illustrates only the general case which is applicable for any OES data set and will be expanded for the specific cases described in the embodiments below. The PLSDA process of the present invention will be described in greater detail with specific regard to particular exemplary embodiments described below. Initially, the process begins by processing data for calibration (step 1602). In accordance with an exemplary embodiment of the present invention, the processing typically involves collecting optical emission spectroscopy (OES) data that is generated during the etch of a calibration wafer. Although it should be understood that the present process is equally adaptable for generating a predictive model for etch processes and determining events other than the endpoint of an etch process, i.e. types of data other than OES data may be used for calibration. However, in accordance with an exemplary embodiment of the present invention, OES sampling of etch processes for endpoint determinations will be described in detail.
The calibration OES data must have a large, unambiguous endpoint signal, and the data must accurately reflect the production process. An exemplary calibration wafer can be a production wafer with a large open area or a specially prepared wafer. In general, the larger the open area of the calibration wafer, the more sensitive the present method will be in detecting endpoint in production wafers with a small open area. Occasionally, the calibration wafers require special processing. Etching calibration wafers that have no resist on their surfaces, such as blanket wafers, gives spectra for a 100% open area and produces, in our experience, the most useful calibration data. Wafers with less than 100% open area can also be used. In general, it is important in the construction of a regression model that all the variations that will be present during the prediction are present in the calibration.
Additionally, the calibration wafers should have the same etch film over the same stop layer as the production wafers. Both films should be made of the same materials as the films in the production wafers and deposited in the same way. Significant differences in the chemical nature of the etch film or stop film will degrade or void the calibration. The films should be on a smooth, unpatterned wafer surface. However, a production wafer that has not been patterned may be used for a calibration wafer if the topology is smooth enough. Culled production wafers are often used for obtaining calibration data if they provide an endpoint signal. Both the etch layer and stop layer should be sufficiently thick to produce stable etching for a minimum of 30 seconds before endpoint and 30 seconds after endpoint. Additionally, the films should have uniformity across the wafer so that the etch film clears uniformly across the wafer surface. Finally, it is good practice to acquire at least two successful etches for each type of calibration wafer in order to check the raw calibration data sets against each other for unexpected anomalies.
With respect to the data itself, the calibration data collection should be stored using the identical data storage conditions as those used for storing the production data, i.e. using compatible software products using identical option parameter settings. Additionally, slight variations between physical components and/or the construction of the different etch chambers may alter the OES data. Therefore, calibration data should be taken with an etch tool identical to that which will be used in production. Additionally, the plasma chemistry should be the same in the calibration process as in the production process, and the process itself should be stable over time. If the data is periodic and the data is detected synchronously, then a minimum of four samples per period must be chosen to capture the variation of the signal. However, significantly more samples per period are often desirable to avoid regions where there are large amounts of noise.
It is expected that the OES data from the calibration wafer is arranged in a folded spectra matrix X, i.e. one row for each sample in the OES data. If not, a folded spectra matrix X can be created prior to creating the training data set.
Processing the OES data also includes removing deceptive or ambiguous data depending on whether the OES data is nonperiodic or if periodic, whether it was sampled synchronously with the period of the signal or asynchronously. A derivative in time may be done for the spectra matrix X and saved separately if the analysis is to be performed on the derivative of the OES data set.
Next, the OES data is processed and an endpoint signal is previewed (step 1604). The PLS regression requires identifying the location of the endpoint transition. The previewed endpoint signal is used to identify samples (or sample times) associated with features to be classified. The processed data set of spectra matrix X can be viewed by PCA, a twowavelength ratio, a moving average filter, or some other appropriate method. Generally, the previewed endpoint signal is taken from a wavelength in the OES data which has a welldefined endpoint. A typical endpoint signal is depicted in FIG. 13A as signal 1300.
The processed OES data is then calibrated using PLSDA (step 1606). The training data set, which has OES data samples of the endpoint transition, is used in the Xblock. For example, with reference to FIG. 13A, the transition region of the endpoint is located between two stable intensity regions that can be identified on endpoint signal 1300. These stable intensity regions correspond to OES data samples taken during the etch process and after the entire etch process has stopped, i.e. between N_{1 }and N_{2 }and between N_{3 }and N_{4}, from the preview endpoint signal.
Using the OES data samples taken between N_{1 }and N_{2 }and between N_{3 }and N_{4 }that were identified from the previewed endpoint signal, an Xblock is created. A class matrix Y is also created that corresponds to the Xblock. A regression bvector is found by applying PLS to the X and yblocks (step 1606). The Xblock is constructed from the processed OES data corresponding to samples taken between N_{1 }and N_{2 }and between N_{3 }and N_{4}. The Xblock may be folded or unfolded depending on whether or not the OES data is periodic and sampled synchronously with its period. Synchronously sampled periodic OES data are arranged in the Xblock one period of samples per row such that the Xblock is unfolded. Nonperiodic OES data and asynchronous periodic OES data are arranged in the Xblock one sample per row such that it is folded. In addition to the Xblock, a class matrix Y, the yblock, is created for classifying the features of the samples. The yblock is created by using binary partitioning. Generally, the class is defined as a region of stable higher or lower intensity values. A typical oneclass vector y is shown in FIG. 13B, and FIG. 13C illustrates a typical Cclass matrix Y.
In FIG. 16, the regression bvector is then validated (step 1608). Validation takes the results of calibration and shows that it gives correct results. In this case, the data that was used for calibration is also used for validating the bvector. So in the validation step, the raw calibration data is processed using the regression bvector to give an endpoint signal. It is necessary to go through all the signal processing step, such as smoothing and elimination of saturated signals. This of course does not include the deletion of segments of the data. The method for calculating the endpoint signal y_{0 }should be the one that will be used for the realtime data. If endpoint signal y_{0 }does not compare favorably to the previewed endpoint signal, then the N_{1}, N_{2}, N_{3 }and N_{4 }samples were probably chosen incorrectly and should be chosen again, i.e. step 1606 is preformed again. If endpoint signal y_{0 }compares favorably to the previewed endpoint signal, then the regression vector b can be relied on for processing realtime data using one of the IP, IPRatio, Ratio or Peak Ratio method (step 1610).
The present process will now be described for the various specific cases, e.g. nonperiodic signals, periodic signals sampled synchronously with the period of the signal (synchronous data) and periodic signals not sampled synchronously with the period of the signal (nonsynchronous data).
In accordance with one exemplary embodiment of the present invention, a predictive model of the endpoint of etch processes using Partial Least Squares Discriminant Analysis (PLSDA) is created using data from nonperiodic signals. By definition, nonperiodic signals cannot be sampled synchronously because they have no repeating period. The process will be described with reference to the highlevel flowchart depicted in FIG. 16 and described above. Deviations from the general case for the specific embodiments will be emphasized in the description below. The process begins by collecting raw OES data from a calibration wafer for a particular etch process and processing that data (step 1602). Generally, the OES data is arranged in a folded spectra matrix X with one sample per row.
FIG. 17 is a detailed flowchart of a process for processing the OES data from the calibration wafer in accordance with an exemplary embodiment of the present invention. The process of FIG. 17 is an expanded description of the process step illustrated by step 1602 in FIG. 16. Calibration begins by loading data from an appropriate calibration wafer from storage (step 1702). The data may be on a convenient storage medium, such as optical media(Compact Disk (CD), Digital Video Disk (DVD), etc.), magnetic media(hard drive, tape disk, etc.), or any other media capable of holding the necessary amount of data. Next, startup and shutdown transients are removed from the data (step 1704). Wavelength regions are selected with desirable endpoint transition quality, such as sharpness (step 1706), and saturated wavelengths are wholly removed (step 1708). Next, the vectors of the spectra intensities, which are the rows of the spectra matrix X, are normalized to have a length of one (step 1710) and the additional smoothing of the data in wavelength dimension may be provided (step 1712). If, at this point, the processed data is not arranged in a folded spectra matrix X, the spectra matrix X is created (step 1714). If desired, SavitskyGolay smoothing is applied on the time axis to obtain smoother endpoint curves (step 1716). Finally, a derivative in time is done on spectra matrix X if the analysis using the derivative is to be performed (step 1718).
Returning to the process depicted in FIG. 16, an endpoint signal is previewed from the processed spectra matrix X using dynamic PCA, a twowavelength ratio, a moving average filter, or some other appropriate method (step 1604). A typical endpoint plot is shown as endpoint signal 1300 in FIG. 13A. The location of an endpoint transition is necessary for PLS regression and is selected based on the location of regions of stable intensity values on the previewed endpoint plot. A single class of intensity values is identified for the etch process. This class corresponds to one of the regions of stable intensity values and may have higher or lower intensity values than the other region of stable intensity values. The second region is attributable to the postetch process. The sample numbers N_{1 }and N_{2 }are chosen to locate the stable region before endpoint and to exclude the startup transient and the transition region. The sample numbers N_{3 }and N_{4 }are chosen to locate the stable region after endpoint and to exclude the transition region and the shutdown transient. By default, a region of unstable intensity values is identified as occurring between the two stable regions which corresponds to the transition region, i.e. between the N_{2 }sample and the N_{3 }sample. The endpoint transition region is shown as the region of endpoint 1300 where the intensity values are changing at endpoint between samples N_{2 }and N_{3}.
The processed OES data is then calibrated using the PLSDA process (step 1606). FIG. 18 is a detailed flow chart depicting a process for creating the Xblock and yblock and using PLS regression to find a bvector in accordance with an exemplary embodiment of the present invention. The process of FIG. 18 is an expanded description of the calibration process step illustrated by step 1606 in FIG. 16. The calibration process begins by reading the processed data in spectra matrix X (step 1802). Sample numbers N_{1}, N_{2}, N_{3 }and N_{4 }are picked from a previewed endpoint plot, such as endpoint signal 1300 as shown in FIG. 13A (step 1804). Sample numbers N_{1 }and N_{2 }are selected from the stable region before endpoint and sample numbers N_{3 }and N_{4 }are selected from the stable region after the endpoint.
The present PSLDA algorithm is extremely flexible enabling a bvector to be regressed from either the processed OES data in the spectra matrix X or the derivative with respect to time of the processed OES data. This particular embodiment is concerned with the processed OES data (step 1806). The X and yblocks are then created for the processed nonperiodic OES data in the spectra matrix X (step 1808). A folded Xblock is created from data located in the regions of stable intensity values, e.g. between sample numbers N_{1 }and N_{2 }and between sample numbers N_{3 }and N_{4}, having one OES sample per row. The yblock is then created for one class of stable intensity values using binary partitioning. Recall that the Xblock was created from OES samples corresponding to two regions of stable intensity values on the previewed endpoint plot. Thus, with regard to oneclass PLSDA processing of the data in spectra matrix X, either the etch region or the postetch region could be represented with a discriminate variable of “1” in the yblock. A convention for assigning discriminate values to the classes should be chosen prior to creating the yblock. Therefore, with respect to the oneclass embodiment described herein, the etch region will be assigned a discriminate variable of “1.” Regions other than the etch region will be assigned a discriminate variable of “0.” Thus, a yblock is created using binary partitioning such that a discriminate variable value of “1” is assigned to the region for the etch process, corresponding to the N_{1 }to N_{2 }samples, and a discriminate value of “0” to all intensity values not in the class, corresponding to the N_{3 }to N_{4 }samples.
The data in the Xblock and the yblock are scaled, typically mean centered, (step 1812) and the PLS regression is then performed on the X and yblocks resulting in a bvector (step 1814). Next, N_{to }and N_{hi }are selected for the bvector (step 1816) and a b_{to }vector is found by limiting OES intensity values to less than N_{lo }and b_{hi }vector by limiting OES intensity values to greater than N_{hi}. Additionally, a b_{hipeak }vector can be selected in 1706 by manually selecting indices of the bvector which form some positive peak(s) and a b_{lopeak }vector by manually selecting indices of the bvector which form some negative peak(s). Each of the b, b_{lo}, b_{hi}, b_{lopeak }and b_{hipeak }vectors that are found in step 1818 are then saved for use in processing realtime OES data for endpoint detection (step 1820).
Returning now to FIG. 16, the regression bvector is then validated (step 1608). If the endpoint signal y_{0 }produced from the inner product of the bvector and the spectra matrix X differs in appearance from the previewed endpoint plot, then the locations of N_{1}, N_{2}, N_{3 }and N_{4 }samples were probably chosen incorrectly and should be chosen again and reprocessed from step 1606. If endpoint signal y_{0 }compares favorably to the previewed endpoint signal, then any of the b, b_{lo}, b_{hi}, b_{lopeak }and b_{hipeak }vectors can be relied upon for processing realtime nonperiodic OES data using the IP, IPRatio, Ratio or Peak Ratio method (step 1610).
In accordance with one exemplary embodiment of the present invention, a predictive model of the endpoint of etch processes using Partial Least Squares Discriminant Analysis (PLSDA) is created using derivatives of nonperiodic OES data. Using the derivative has an advantage over using OES data in that wavelengths that have a faster transition at endpoint are preferentially selected over others. The present process is essentially identical to that described above with the exception of the creation of the X and yblocks. Instead of using processed nonperiodic OES data in the spectra matrix X as in the prior embodiment, here the time derivative of the processed OES data in the spectra matrix X is used. Therefore, a derivative in time must be taken on the processed OES data contained in spectra matrix X, if not already prepared in step 1718 of the process depicting in FIG. 17 above. The X and yblocks are then created using the derivative data in accordance with step 1810 of the expanded calibration process depicted in FIG. 18. Since the PLS regression is now to be performed on the derivatives of the OES data, the class must then represent some evolving or changing feature on the previewed endpoint plot. Clearly, the only feature apparent on a plot of the derivative values is the transition region (see again FIG. 14A). Therefore, derivative data from all sample numbers between N_{1 }and N_{4 }are arranged one sample per row to create a folded Xblock. The yblock is then created for one class of unstable or evolving intensity values using binary partitioning. Since the class is now the transition region, a discriminate variable of “1” is assigned to the transition region in the yblock and a discriminate variable of “0” elsewhere. The X and yblocks can be regressed using PLS and validated identically to that described above for raw OES data. The resulting b, b_{lo}, b_{hi}, b_{lopeak }and b_{hipeak }vectors can be relied upon for processing realtime nonperiodic OES data using the IP, IPRatio, Ratio or Peak Ratio method.
Although the abovedescribed method for using Partial Least Squares Discriminant Analysis on derivatives of OES data was described with respect to the oneclass case, it is equally applicable to a Cclass case. Additionally, the abovedescribed method for using Partial Least Squares Discriminant Analysis on derivatives of OES data is applicable to periodic OES data as well as nonperiodic OES data, regardless of whether or not the periodic data was synchronously sampled. The Cclass case and cases for handling periodic signals are described below.
In accordance with one exemplary embodiment of the present invention, a predictive model of the C endpoints of etch processes using Partial Least Squares Discriminant Analysis (PLSDA) is created using data from nonperiodic signals. The present process is similar to that described above for the oneclass case, but involves identifying transition regions associated with multiple endpoints.
When one layer is etched away, the etch process will erode the next layer and so on until the C^{th }etch layer is etched away. Therefore, the calibration OES data must have C unambiguous endpoint transitions that accurately reflect all endpoints. The OES data may be arranged in a folded spectra matrix X and processed according to FIG. 17 above.
C transition regions associated with the C endpoints must be located on the previewed endpoint signal. Wavelength regions for each of the C endpoints must be selected with desirable endpoint transitions quality for the respective endpoints in step 1706. To preview an endpoint plot for picking the 2C+1 samples that define the C transition regions, a single endpoint plot may be produced from wavelength segments using dynamic PCA, a twowavelength ratio, a moving average filter, or some other appropriate method (step 1604). Alternatively, C separate wavelengths may be selected, one for each of the C endpoints, each with desirable endpoint transitions quality for a respective endpoint.
Regardless of how the endpoints are previewed, the locations of the C transition regions must be selected as required by step 1804 in the expanded calibration process depicted in FIG. 18. The method for locating these regions is an extension of the oneclass case. C+1 regions of stable intensity values are located on the previewed endpoint plot having C transition regions therebetween. The first region of stable intensity values (either highest or lowest values) is located before a first transition region of evolving intensity values which corresponds to an etch or postetch region, e.g. between the N_{1 }sample and the N_{2 }samples. Next, a region of stable intensity values (either increasingly higher or decreasingly lower intensity values than the first stable region) are located between two transition regions of evolving intensity values which correspond to etch/postetch region(s), e.g. between each of the N_{3 }and N_{4 }samples through the N_{2C−1 }and N_{2C }samples. A last region (the (C+1)^{th }region) of stable intensity values (either higher or lower intensity values than the highest or lowest intensity values for the (C−1) stable regions) is located after the C^{th }transition region of evolving intensity values and corresponds to either the etch or postetch region, e.g. between the N_{2C+1 }sample and the N_{2C+2 }sample. Lastly, C regions of evolving intensity values are located between each of the N_{2 }and N_{3 }samples through the N_{2C }and N_{2C+1 }samples.
Next, the X and Yblocks are created in accordance with step 1808 of the expanded calibration process depicted in FIG. 18. The Xblock is created from OES samples in the spectra matrix X between each of the N_{1 }to N_{2 }samples through the N_{2C+1 }to N_{2C+2 }samples. Therefore, the data in the Xblock excludes OES data in the transition regions, i.e. between each of the N_{2 }and N_{3 }samples through the N_{2C }and N_{2C+1 }samples. The data is arranged one sample per row in a folded Xblock arrangement. The Yblock is then created by using binary partitioning. For each column in the Yblock, one assigns a discriminate variable value of “1” to the region of stable intensity values associated with the etch region for the particular etch being classified by that column, and a discriminate value of “0” everywhere else. An exemplary Cclass Yblock is depicted in FIG. 13C.
The data in the Xblock is scaled (step 1812) and the X and Yblocks are regressed using PLS regression resulting in a regression B matrix (step 1814). Note here that because the Yblock is a C column matrix, the regression B matrix also has C columns or C separate bvectors, one for each endpoint. N_{lo }and N_{hi }are then selected for each bvector (step 1816) and a b_{lo }vector is determined for each bvector by limiting OES intensity values to less than N_{lo }and b_{hi }vector by limiting OES intensity values to greater than N_{hi}. Additionally, a b_{hipeak }vector may be found for each bvector by manually selecting indices of that bvector which forms some positive peak(s) and a corresponding b_{lopeak }vector by manually selecting indices of the bvector which form some negative peak(s). Each of the b, b_{lo}, b_{hi}, b_{lopeak }and b_{hipeak }vectors associated with each of the bvectors, are then saved for use in processing realtime OES data for endpoint detection (step 1820).
The regression B matrix is then validated by obtaining an endpoint signal y_{0 }from the inner product of the regression bvector and a vector x_{0 }(step 1608). If the endpoint signals y_{0 }for each etch transition endpoint produced from the inner product of the bvector and the spectra matrix X differs in appearance from the previewed endpoint plots, then the locations of N_{1 }to N_{2C+1 }samples were probably chosen incorrectly and should be chosen again and reprocessed in step 1606. If the endpoint signals compare favorably to the previewed endpoint signal, then the regression matrix B can be relied on for processing realtime data using one of the IP, IPRatio, Ratio or Peak Ratio method for determining the respective C endpoints (step 1610).
Although the abovedescribed method for using Partial Least Squares Discriminant Analysis of OES data was described with respect to the nonperiodic data, it is applicable to periodic OES data, regardless of whether or not the periodic data was synchronously sampled. Cases for handling periodic signals are described below.
In accordance with one exemplary embodiment of the present invention, a predictive model of the endpoint of etch processes using Partial Least Squares Discriminant Analysis (PLSDA) is created using nonsynchronously sampled data from periodic signals. In this case, the signal has a repeating periodic component, but the data has not been sampled synchronously with the period. In this case, the periodic signal will appear as a noise on a plot of an endpoint signal, thereby masking the locations of regions that are necessary for PLS regression. However, the PLSDA analysis of asynchronous OES data is exactly the same as nonperiodic OES data described above, except that the spectra matrix X is filtered using an appropriate method to remove the noise (actually the periodic component) from the OES data. The process then proceeds as described above with respect to FIGS. 1618 described above. Of course, because the bvector was created from filtered OES data from the sample wafer, the realtime process OES data must be likewise filtered.
In accordance with another exemplary embodiment of the present invention, the bvector may be created from unfiltered asynchronous OES data in exactly the same manner as described above for nonperiodic OES data. The previewed endpoint plot must be filtered to remove the periodic component from the endpoint signal in order to select N_{1}, N_{2}, N_{3 }and N_{4}. The bvector is validated by filtering the response vector y_{0 }(actual the endpoint signal) with the same filter to remove the periodic component for comparison with the previewed endpoint signal. Then, during realtime processing using one of the IP, IPRatio, Ratio or Peak Ratio methods, the resultant values of y_{0 }are filtered with the same filter to remove the periodic component and the endpoint chosen from the filtered realtime endpoint.
In accordance with one exemplary embodiment of the present invention, a predictive model of the endpoint of etch processes using Partial Least Squares Discriminant Analysis (PLSDA) is created using data from periodic signals which are sampled synchronously with the period of the signal. Essentially, the present process extends the method for oneclass PLSDA analysis of nonperiodic OES data by processing OES samples with respect to their position in the period of the signal and then arranging the processed periodic OES data in an unfolded spectra matrix X and an unfolded Xblock rather than in folded matrices as for nonperiodic OES data.
The present process will be described with respect to the general case which is applicable for any OES data set as depicted in FIG. 16 and supplemented by flow charts depicting expansions of the general case that are necessary for understanding PLSDA analysis of synchronously detected periodic OES data. The process begins by collecting raw OES data from a calibration wafer for a particular etch process and processing that data (step 1602). Generally, the OES data from the calibration wafer is arranged in a folded spectra matrix X with one sample per row for processing.
FIG. 15D is a diagram of an exemplary bvector 1560 which is derived from synchronously sampled periodic OES data in accordance with an exemplary embodiment of the present invention. bvector 1560 contains a wavelength component corresponding to every wavelength in each spectrum in a period. Since the period corresponds to a revolution of a magnet housing on a magnetically enhanced etcher, the samples correspond to 28 positions on the housing where the samples were taken. bvector 1560 may be thought of as 28 separate bvectors, one bvector for each of the 28 sample positions on the housing, that are sequentially combined into a single bvector. Therefore, vector 1560 contains a wavelength component for every wavelength in the sampled spectrum at each of the 28 sample positions. Of course, bvector 1560 is exemplary and the actual number of wavelength components for a bvector will depend on the actual number of magnet positions on the etcher and on the number of discrete wavelengths in the spectrum.
FIG. 19 is a flowchart depicting an expanded process for processing synchronously detected periodic OES data from the calibration wafer in accordance with an exemplary embodiment of the present invention. The process of FIG. 19 is an expanded description of the process step illustrated by step 1602 in FIG. 16, and is similar to that described by the method for processing data for calibration depicted in FIG. 17, but now includes processing steps for samples with respect to their position in the period of the signal. Calibration of the data from the calibration wafer begins by loading data from an appropriate calibration wafer from storage (step 1902). Startup and shutdown transients are first removed from the data (step 1904) and data positions within a period are examined for large or anomalous noise. With respect to magneticallyenhanced etch processes, this noise may be associated with the magnetic positions that created the noise. The data samples associated with those positions should be removed from the data (step 1906). Wavelength regions are selected with desirable endpoint transition quality, such as sharpness (step 1908) and saturated wavelengths are wholly removed (step 1910).
Next, the data set is rounded to an integral number of rotations by removing data corresponding to partial rotations at the beginning and end of the etch (step 1912). Only data from complete periods are used in the spectra matrix X. The vectors of the spectra intensities can be normalized to have a length of one if desired (step 1914) and the additional smoothing of the data in the wavelength dimension can be done (step 1918).
An unfolded spectra matrix X is then created from the processed OES data (step 1920). Each period of sample data in the folded spectra matrix X is unfolded to a single row vector by placing sequentially each row of spectra for a period to the right of the first creating a row vector of processed OES data. For example, if the data for a period P_{t }is a 12 by 1201 matrix, then P_{t }is unfolded to a 1 by 14412 row vector. Data from subsequent periods in the spectra matrix X are placed below this row vector. The unfolding will remove variance due to the periodicity. If desired, SavitskyGolay smoothing is applied on the time axis to obtain smoother endpoint curves (step 1922). Finally, a derivative in time is done on spectra matrix X if the analysis using the derivative is to be performed (step 1924).
Returning to the process depicted in FIG. 16, an endpoint signal is previewed from the processed data in spectra matrix X using dynamic PCA, a twowavelength ratio, a moving average filter, or some other appropriate method (step 1604) (shown as endpoint signal 1300 in FIG. 13A). The location of an endpoint transition is identified for PLS regression based on the location of regions of stable intensity values in the previewed endpoint plot similar to that described above. Samples N_{1}, N_{2}, N_{3 }and N_{4 }are selected from the previewed endpoint plot.
The processed OES data is then calibrated using the PLSDA process (step 1606). FIG. 18 is a detailed flowchart depicting a process for creating the Xblock and yblock and using PLS regression to find a bvector in accordance with an exemplary embodiment of the present invention. The calibration process begins by reading the processed data in spectra matrix X (step 1802) using sample numbers N_{1}, N_{2}, N_{3 }and N_{4 }(step 1804).
Since this particular embodiment is concerned with the processed OES data (step 1806), the X and yblocks are created for the processed periodic OES data in the unfolded spectra matrix X (step 1810). The training data set is arranged in an unfolded Xblock created from data located in the regions of stable intensity values, e.g. between sample numbers N_{1 }and N_{2 }and between sample numbers N_{3 }and N_{4}, having one signal period of OES spectra per row. The yblock is then created for the class of stable intensity values using binary partitioning. The etch region is assigned a discriminate variable of “1.” Regions other than the etch region are assigned a discriminate variable of “0.” Thus, a yblock is created using binary partitioning such that a discriminate variable value of “1” is assigned to the region of intensity values for the etch process, corresponding to the N_{1 }to N_{2 }samples, and a discriminate value of “0” is assigned to all intensity values not in the class, corresponding to the N_{3 }to N_{4 }samples.
The data in the Xblock and yblock are scaled (step 1812) and the PLS regression is then performed on the X and yblocks resulting in a bvector (step 1814) Next, the b_{lo}, b_{hi}, b_{lopeak }and b_{hipeak }vectors are determined from the bvector using N_{lo }and N_{hi }selections (step 1816) or by manually selecting indices of the bvector. Each of the b, b_{lo}, b_{hi}, b_{lopeak }and b_{hipeak }vectors are then saved for use in processing realtime OES data for endpoint detection (step 1820).
As described above, prior to using the regression bvector for processing realtime data it is validated by filtering the calibration data to produce an endpoint signal y_{0}. If the endpoint signal y_{0 }differs in appearance from the previewed endpoint plot, then the locations of N_{1}, N_{2}, N_{3 }and N_{4 }samples were probably chosen incorrectly and should be chosen again and reprocessed from step 1606. If endpoint signal y_{0 }compares favorably to the previewed endpoint signal, then any of the b_{lo}, b_{hi}, b_{lopeak }and b_{hipeak }from the regression matrix B can be relied upon for processing realtime data using one of the IP, IPRatio, Ratio or Peak Ratio methods (step 1610).
In addition to using partial least squares discriminant analysis for endpoint determination of a wafer in a production etch process, PLSDA may be used for endpoint determination of chamber clean processes in accordance with other exemplary embodiments of the present invention. Chemical Vapor Deposition (CVD) is a technique for film deposition in semiconductor manufacturing. During this process, the film that is deposited on the wafer is also deposited on the interior walls of a CVD chamber. When the film on the walls becomes too thick, it will flake off and cause particle contamination of the wafers that are processed. To prevent this, CVD tool manufacturers include a chamber cleaning step for cleaning the interior of the CVD chamber after a specified number of wafers are processed. The cleaning step involves using a reactive plasma in the chamber. Film on the interior chamber walls is removed by a reaction with one or more components in the plasma, as it is in a wafer etching process.
Optical emission from the plasma is monitored to detect the endpoint of the cleaning step in a way that is similar to the method for the detection of the etch endpoint. The calibration data will show a stable postetch region after all surfaces have cleared. Therefore, the N_{2}toN_{3 }region may be identified as before. However, during the chamber clean, it may be that there are multiple surfaces of different sizes, film thicknesses, and locations. These surfaces may clear at different times and there may not be a stable etch region. If there is a stable etch region, the N_{1}toN_{2 }region may be identified as described above. Otherwise, if there is not a stable region, N_{1 }and N_{2 }must be chosen in a novel way that characterizes the etch region. A single spectra or a region may be chosen where the etch is changing. In the later case, an average is obtained from the analysis for this N_{1}toN_{2 }region. The regression for the bvector is performed as described above. The endpoint may be detected by Equations 4, 5, or 7. One of ordinary skill in the art would readily understand from the present invention that other variations are possible.
In accordance with still other exemplary embodiments of the present invention, PLSDA is utilized for fault detection during, for example, a semiconductor fabricating process. In realtime or from wafertowafer, we wish to know if a process or a process tool is running as expected. Further, if a failure is found, we wish to know what part of the tool or process has failed so that it can be repaired quickly. PLSDA is capable of doing these two things by observing many sensors on a process tool in realtime. The realtime sensor readings, G_{ij}, from a semiconductor process tool can be measured at equaltime intervals during the process run. These are measurements of process parameters such as gas pressure, gas flow rate, throttle valve position, RF tuning capacitor position, RF power, temperature, etc. If appropriate, the intensity I_{ij }of a spectra is also measured. Some of these parameters may not be measured, even though they have an effect on the process. These could include any of the previously mentioned parameters. To calibrate, C failure classes are identified for the tool or process. Calibration data is generated by measuring the parameters from a process and changing values of the parameters associated with the C failure modes. These parameters may be changed by changing their magnitudes or turning them off. For instance, if one of the failure modes is associated with a decrease in the flow rate of a gas, the flow rate of that gas is decreased during the calibration. Parameters associated with all C classes may be changed in one calibration set or one parameter may be changed in each of C calibration sets. The calibration data will then produce a V×C Bmatrix, i.e. Y_{ij}=X_{ik}.B_{kj}. The failure class j is then monitored with y_{i}, where y_{i}=X_{ij}.b_{j}. In each case, PLSDA will identify sensors that are sensitive to drift of a given process setting and generates a trend chart that changes when the process parameter drifts out of its normal setting.
The corresponding structures, materials, acts, and equivalents of all means or step plus function elements in the claims below are intended to include any structure, material, or act for performing the function in combination with other claimed elements as specifically claimed. While the present method has been described primarily in terms of synchronously detected periodic signal data, it may also be applied to other cases of data acquisition as described herein, including nonperiodic signals, and asynchronously detected periodic signals.
The description of the present invention has been presented for purposes of illustration and description, but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.