|
4.
Process Modeling
4.5. Use and Interpretation of Process Models 4.5.1. What types of predictions can I make using the model?
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Step 1: Plug Predictors Into Estimated Function | Once a model that gives a good description of the process has been developed, it can be used to make a variety of different types of predictions about the process. To estimate the average response of the process, or, equivalently, the value of the regression function, for any particular combination of predictor variable values, the values of the predictor variables are simply evaluated in the estimated regression function itself. These estimated function values are often loosely called predicted values, along with other types of estimates from regression models. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Pressure / Temperature Example |
For example, in the Pressure/Temperature process, which is well described by a
straight line model relating pressure ( by substituting the estimated parameter values (given below) into the functional part of the model. Then to estimate the average pressure at a temperature of 65, the predictor value of interest is evaluated in the estimated regression function, yielding an estimated pressure of 263.21. This estimation process works analogously for nonlinear models, LOESS models, and all other types of functional process models. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Output from Straight-Line Fit to Pressure / Temperature Data |
LEAST SQUARES POLYNOMIAL FIT
SAMPLE SIZE N = 40
DEGREE = 1
NO REPLICATION CASE
PARAMETER ESTIMATES (APPROX. ST. DEV.) T VALUE
1 A0 7.74899 ( 2.354 ) 3.292
2 A1 3.93014 (0.5070E-01) 77.51
RESIDUAL STANDARD DEVIATION = 4.299098
RESIDUAL DEGREES OF FREEDOM = 38
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Polymer Relaxation Example |
Based on the output from fitting the stretched exponential model in time ( Therefore, the estimated torque ( |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Uncertainty Needed |
Knowing that the average pressure is 263.21 at a temperature of 65, or that the average torque on a polymer sample under
particular conditions is 5.26, however, is not enough information to make scientific or engineering decisions about the
process. This is because the pressure value of 263.21 is only an estimate of the average pressure at a temperature of 65.
Because of the random error in the data, there is also random error in the estimated regression parameters, and in the
values predicted using the model. To use the model correctly, therefore, the uncertainty in the prediction must also be
quantified. For example, if the safe operational pressure of a particular type of gas tank that will be used at a
temperature of 65 is 300, different engineering conclusions would be drawn from knowing the average actual pressure in the
tank is likely lie somewhere in the range |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Confidence Intervals |
In order to provide the necessary information with which to make engineering or scientific decisions, predictions from
process models are usually given as intervals of plausible values that have a probabilistic interpretation. In particular,
intervals that specify a range of values that will contain the value of the regression function with a pre-specified
probability are often used. These intervals are called confidence intervals. The probability with which the interval will
capture the true value of the regression function is called the confidence level, and is most often set by the user
to be 0.95, or 95% in percentage terms. Any value between 0% and 100% could be specified, though it would almost never make
sense to consider values outside a range of about 80% to 99%. The higher the confidence level is set, the more likely
the true value of the regression function is to be contained in the interval. The trade-off for high confidence, however,
is long intervals. As the sample size is increased, however, the average length of the intervals typically decreases for
any fixed confidence level. The confidence level of an interval is usually denoted symbolically using the notation
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Computing Confidence Intervals | Confidence intervals are computed using the estimated standard deviations of the estimated regression function values and a coverage factor that controls the confidence level of the interval and accounts for the variation in the estimate of the residual standard deviation. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The standard deviations of the predicted values of the estimated regression function depend on the standard deviation of the random errors in the data, the experiment design used to collect the data and fit the model, and the values of the predictor variables used to obtain the predicted values. Since these standard deviations are not simple quantities that can be read off of the output summarizing the fit of the model, they can often be obtained from the software used to fit the model. This is the best option, if available, because there are a variety of numerical issues that can arise when the standard deviations are calculated directly using typical theoretical formulas. Carefully written software should minimize the numeric problems encountered. If necessary, however, matrix formulas that can be used to directly compute these values are given in texts such as Neter, Wasserman, and Kutner. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The coverage factor used to control the confidence level of the intervals depends on the distributional assumptions about
the data and the amount of information available to estimate the residual standard deviation of the fit.
For procedures that depend on the assumption that the random errors have a normal distribution, the coverage factor is
typically a cut-off value from the Student's t distribution at the user's
pre-specified confidence level and with the same number of degrees of freedom as used to estimate the residual standard
deviation in the fit of the model. Tables of the t distribution (or functions in software) may be indexed in by the
confidence level ( |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
The estimated values of the regression function, their standard deviations, and the coverage factor are combined using the
formula
where |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Confidence Intervals for the Example Applications |
Computing confidence intervals for the average pressure in the
Pressure/Temperature example, for temperatures of 25, 45 and 65 and for the
average torque on specimens from the polymer relaxation example at different
times and temperatures gives the results listed in the tables below. Note: the number of significant digits shown in the
tables below is larger than would normally be reported. However, as many significant digits as possible should be carried
throughout all calculations and results should only rounded for final reporting. If reported numbers may be used in
further calculations, then they should not be rounded even when finally reported. A useful rule for rounding final results
that will not be used for further computation is to round all of the reported values to one or two digits in the total
uncertainty, |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Pressure / Temperature Example |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Polymer Relaxation Example |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Interpretation of Confidence Intervals | As mentioned above, confidence intervals capture the true value of the regression function with a user-specified probability, the confidence level, using the estimated regression function and the associated estimates of the random error in the data. Simulation of many sets of data from a process model provides a good way to get a detailed understanding of the probabilistic nature of these intervals. The advantage of using simulation is that the true model parameters are known, which is never the case for a real process. This allows direct comparison of how confidence intervals constructed from a limited amount of data relate to the true values that are being estimated. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| The plot below shows 95% confidence intervals computed using 50 independently generated data sets that follow the same model as the data in the Pressure/Temperature example. Random errors from the normal distribution with a mean of zero and a known standard deviation are added to each set of true temperatures and true pressures that lie on a perfect straight line to obtain the simulated data. Then the noisy data in each data set are used to compute a confidence interval for the average pressure at a temperature of 65. The dashed reference line marks the true value of the average pressure at a temperature of 65. | |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Confidence Intervals Computed from 50 Sets of Simulated Data |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Confidence Level Specifies Long-Run Interval Coverage | From the plot it is easy to see that not all of the intervals contain the true value of the average pressure. Data sets 16, 26, and 39 all produced intervals that did not cover the true value of the average pressure at a temperature of 65. Sometimes the interval may fail to cover the true value because the estimated pressure is unusually high or low because of the random errors in the data set. In other cases, the variability in the data may be underestimated, leading to an interval that is too short to cover the true value. However, for 47 out of 50, or approximately 95% of the data sets, the confidence intervals did cover the true average pressure. When the number of data sets was increased to 5000, confidence intervals computed for 4723, or 94.46%, of the data sets covered the true average pressure. Finally, when the number of data sets was increased to 10000, 95.12% of the confidence intervals computed covered the true average pressure. Thus, the simulation shows that although any particular confidence interval might not cover its associated true value, in repeated experiments this method of constructing intervals produces intervals that cover the true value at the rate specified by the user as the confidence level. Unfortunately, when dealing with real processes with unknown parameters, it is impossible to know whether or not particular confidence interval that have been computed do contain the true value. It is nice to know that the error rate can be controlled, however, and can be set so that it is far more likely than not that each interval produced does contain its true value. | ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| Interpretation Summary |
To summarize the interpretation of the probabilistic nature of confidence intervals in words, in independent, repeated
experiments, |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||