|
4.
Process Modeling
4.3. Data Collection for Process Modeling
|
|||
| Output from Process Model is Fitted Mathematical Function |
The output from process modeling is a fitted mathematical function
with estimated coefficients. For example, in modeling resistivity
Y as a function of dopant density X,
an analyst may suggest the function
The functional form is the above quadratic and the coefficients to be estimated are A0, A1, and A2. Even for a given functional form, there is an infinite number of potential coefficient values that potentially may be used. Each of these coefficient values will in turn yield predicted values. |
||
| What are Good Coefficient Values? | Poor values of the coefficients are those for which the resulting predicted values are considerably different from the observed raw data Y. Good values of the coefficients are those for which the resulting predicted values are close to the observed raw data Y. The best values of the coefficients are those for which the resulting predicted values are close to the observed raw data Y, and the statistical uncertainty connected with each coefficient is small. | ||
There are two considerations that are useful for the generation
of "best" coefficients:
|
|||
| Least Squares Criterion |
For a given data set (e.g., 10 (X,Y)
pairs), the most common procedure for obtaining the
coefficients for The overwhelming majority of regression programs today use the least squares criterion for estimating the model coefficients. Least squares estimates are popular because
|
||
| Design of Experiment Principles | As to what values should be used for the X's, we look to established experimental design principles for guidance. | ||
| Principle 1: Minimize Coefficient Estimation Variation |
The first principle of experimental design is to
control the values within the X vector such that
after the Y data are collected, the subsequent model
coefficients are as good, in the sense of having the smallest
variation, as possible.
The key underlying point with respect to design of experiments and process modeling is that even though (for simple (X,Y) fitting, for example) the least squares criterion may yield optimal (minimal variation) estimators for a given distribution of X values, some distributions of data in the X vector may yield better (smaller variation) coefficient estimates than other X vectors. If the analyst can specify the values in the X vector, then he or she may be able to drastically change and reduce the noisiness of the subsequent least squares coefficient estimates. |
||
| Five Designs |
To see the effect of experimental design on process modeling,
consider the following simplest case of fitting a line:
Suppose the analyst can afford 10 observations (that is, 10 (X,Y) pairs) for the purpose of determining optimal (that is, minimal variation) estimates of A0 and A1. What 10 X values should be used for the purpose of collecting the corresponding 10 Y values? Colloquially, where should the 10 X values be sprinkled along the horizontal axis so as to minimize the variation of the least squares estimated coefficients for A0 and A1? Should the 10 X values be:
For each of the above five experimental designs, there will of course be Y data collected, followed by the generation of least squares estimates for A0 and A1, and so each design will in turn yield a fitted line. |
||
| Are the Fitted Lines Better for Some Designs? |
But are the fitted lines, i.e., the fitted process models, better
for some designs than for others? Are the coefficient estimate
variances smaller for some designs than for others? For given
estimates, are the resulting predicted values better (that is,
closer to the observed Y values) than for other designs? The
answer to all of the above is YES. It DOES make a difference.
The most popular answer to the above question about which
design to use for linear modeling is design #1 with ten
equi-spaced points. This answer, however, is not correct. It can
be shown that
So to obtain minimum variance estimators, one maximizes the denominator on the right. To maximize the denominator, it is (for an arbitrarily fixed |
||
| What is the Worst Design? | What is the worst design in the above case? Of the five designs, the worst design is the one that has maximum variation. In the mathematical expression above, it is the one that minimizes the denominator, and so this is design #4 above, for which almost all of the data are located at the mid-range. Clearly the estimated line in this case is going to chase the solitary point at each end and so the resulting linear fit is intuitively inferior. | ||
| Designs 1, 2, and 5 |
What about the other 3 designs? Designs 1, 2, and 5 are
useful only for the case when we think the model may be
linear, but we are not sure, and so we allow additional
points that permit fitting a line if appropriate, but build
into the design the "capacity" to fit beyond a line (e.g.,
quadratic, cubic, etc.) if necessary. In this regard, the
ordering of the designs would be
|
||