|
4.
Process Modeling
4.4. Data Analysis for Process Modeling
|
|||
|
|
Model validation is possibly the most important step in the model building
sequence. It is also one of the most overlooked. Often the validation of
a model seems to consist of nothing more than quoting the
|
||
| Main Tool: Graphical Residual Analysis |
There are many statistical tools for model validation, but the primary tool
for most process modeling applications is graphical residual analysis.
Different types of plots of the residuals (see definition
below) from a fitted
model provide information on the adequacy of different aspects of the model.
Numerical methods for model validation, such as the |
||
| Numerical Methods' Forte | Numerical methods do play an important role as confirmatory methods for graphical techniques, however. For example, the lack-of-fit test for assessing the correctness of the functional part of the model can aid in interpreting a borderline residual plot. There are also a few modeling situations in which graphical methods cannot easily be used. In these cases, numerical methods provide a fallback position for model validation. One common situation when numerical validation methods take precedence over graphical methods occurs when the number of parameters being estimated is relatively close to the size of the data set. In this siutation residual plots are often difficult to inpterpret due to constraints on the residuals imposed by the estimation of the unknown parameters. One area where this typically happens is in optimization applications using designed experiments. Logistic regression with binary data is another area where graphical residual analysis can be difficult. | ||
| Residuals |
The residuals from a fitted model are the differences between the responses
observed at each combination of explanatory variables and the corresponding
prediction of the response computed using the regression function.
Mathematically the definition of the residual for the ith
observation in the data set is written
where |
||
| Example |
The data listed below is from the
Pressure/Temperature example introduced
in Section 4.1.1. The first column
contains the values of the explanatory variable, Temperature, and the second
contains the observed responses, Pressure. The third column gives the
corresponding values from the fitted straight-line regression function.
The last column lists the residuals, the difference between columns two and three. |
||
| Data, Fitted Values & Residuals |
Run Ambient Fitted Order Day Temperature Temperature Pressure Value Residual 1 1 23.820 54.749 225.066 222.920 2.146 2 1 24.120 23.323 100.331 99.411 0.920 3 1 23.434 58.775 230.863 238.744 -7.881 4 1 23.993 25.854 106.160 109.359 -3.199 5 1 23.375 68.297 277.502 276.165 1.336 6 1 23.233 37.481 148.314 155.056 -6.741 7 1 24.162 49.542 197.562 202.456 -4.895 8 1 23.667 34.101 138.537 141.770 -3.232 9 1 24.056 33.901 137.969 140.983 -3.014 10 1 22.786 29.242 117.410 122.674 -5.263 11 2 23.785 39.506 164.442 163.013 1.429 12 2 22.987 43.004 181.044 176.759 4.285 13 2 23.799 53.226 222.179 216.933 5.246 14 2 23.661 54.467 227.010 221.813 5.198 15 2 23.852 57.549 232.496 233.925 -1.429 16 2 23.379 61.204 253.557 248.288 5.269 17 2 24.146 31.489 139.894 131.506 8.388 18 2 24.187 68.476 273.931 276.871 -2.940 19 2 24.159 51.144 207.969 208.753 -0.784 20 2 23.803 68.774 280.205 278.040 2.165 21 3 24.381 55.350 227.060 225.282 1.779 22 3 24.027 44.692 180.605 183.396 -2.791 23 3 24.342 50.995 206.229 208.167 -1.938 24 3 23.670 21.602 91.464 92.649 -1.186 25 3 24.246 54.673 223.869 222.622 1.247 26 3 25.082 41.449 172.910 170.651 2.259 27 3 24.575 35.451 152.073 147.075 4.998 28 3 23.803 42.989 169.427 176.703 -7.276 29 3 24.660 48.599 192.561 198.748 -6.188 30 3 24.097 21.448 94.448 92.042 2.406 31 4 22.816 56.982 222.794 231.697 -8.902 32 4 24.167 47.901 199.003 196.008 2.996 33 4 22.712 40.285 168.668 166.077 2.592 34 4 23.611 25.609 109.387 108.397 0.990 35 4 23.354 22.971 98.445 98.029 0.416 36 4 23.669 25.838 110.987 109.295 1.692 37 4 23.965 49.127 202.662 200.826 1.835 38 4 22.917 54.936 224.773 223.653 1.120 39 4 23.546 50.917 216.058 207.859 8.199 40 4 24.450 41.976 171.469 172.720 -1.251 |
||
| Why Use Residuals? | Assuming the model fit to the data is correct, the residuals approximate the random errors that make the relationship between the explanatory variables and the response variable a statistical relationship. Therefore, if the residuals appear to behave randomly, it suggests that the model fits the data well. On the other hand, if non-random structure is evident in the residuals, it is a clear sign that the model fits the data poorly. The subsections listed below detail the types of plots to use to test different aspects of a model and give guidance on the correct interpretations of different results that could be observed for each type of plot. | ||
| Model Validation Specifics |
|
||