Next Page Previous Page Six Sigma Home Tools & Aids Search Handbook
1. Exploratory Data Analysis
1.4. EDA Case Studies
1.4.2. Case Studies
1.4.2.5. Beam Deflections

1.4.2.5.2.

Test Underlying Assumptions

Goal The goal of this analysis is threefold:
  1. Determine if the univariate model:

    is appropriate and valid.

  2. Determine if the typical underlying assumptions for an "in control" measurement process are valid. These assumptions are:
    1. random drawings;
    2. from a fixed distribution;
    3. with the distribution having a fixed location; and
    4. the distribution having a fixed scale.
  3. Determine if the confidence interval

    is appropriate and valid where s is the standard deviation of the original data.

4-Plot of Data
Interpretation The assumptions are addressed by the graphics shown above:
  1. The run sequence plot (upper left) indicates that the data do not have any significant shifts in location or scale over time.
  2. The lag plot (upper right) shows that the data are not random. The lag plot further indicates the presence of a few outliers.
  3. When the randomness assumption is thus seriously violated, the histogram (lower left) and normal probability plot (lower right) are ignored since determining the distribution of the data is only meaningful when the data are random.
From the above plots we conclude that the underlying randomness assumption is not valid. Therefore, the model
is not appropriate.

We need to develop a better model. Non-random data can frequently be modeled using time series mehtodology. Specifically, the circular pattern in the lag plot indicates that a sinusoidal model might be appropriate. The sinusoidal model will be developed in the next section.

Individual Plots The plots can be generated individually for more detail. In this case, only the run sequence plot and the lag plot are drawn since the distributional plots are not meaningful.
Run Sequence Plot
Lag Plot

We have drawn some lines and boxes on the plot to better isolate the outliers. The following output helps identify the points that are generating the outliers on the lag plot.

  
       ****************************************************
       **  print y index xplot yplot subset yplot > 250  **
       ****************************************************
  
  
 VARIABLES--Y              INDEX          XPLOT          YPLOT   

         300.00         158.00        -506.00         300.00
  
       ****************************************************
       **  print y index xplot yplot subset xplot > 250  **
       ****************************************************
  
  
 VARIABLES--Y              INDEX          XPLOT          YPLOT   

         201.00         157.00         300.00         201.00
  
       ********************************************************
       **  print y index xplot yplot subset yplot -100 to 0
                                     subset xplot -100 to 0  **
       ********************************************************
  
  
 VARIABLES--Y              INDEX          XPLOT          YPLOT   

         -35.00           3.00         -15.00         -35.00
  
       *********************************************************
       **  print y index xplot yplot subset yplot 100 to 200
                                     subset xplot 100 to 200  **
       *********************************************************
  
  
 VARIABLES--Y              INDEX          XPLOT          YPLOT   

         141.00           5.00         115.00         141.00
  
That is, the third, fifth, and 158th points appear to be outliers.
Autocorrelation Plot When the lag plot indicates significant non-randomness, it can be helpful to follow up with a an autocorrelation plot.

This autocorrelation plot shows a distinct cyclic pattern. As with the lag plot, this suggests a sinusoidal model.

Spectral Plot Another useful plot for non-random data is the spectral plot.

This spectral plot shows a single dominant peak at a frequency of 0.3. This frequency of 0.3 will be used in fitting the sinusoidal model in the next section.

Quantitative Output Although the lag plot, autocorrelation plot, and spectral plot clearly show the violation of the randomness assumption, we supplement the graphical output with some quantitative measures.
Summary Statistics As a first step in the analysis, a table of summary statistics is computed from the data. The following table, generated by Dataplot, shows a typical set of statistics.
 
                                SUMMARY
 
                     NUMBER OF OBSERVATIONS =      200
 
 
***********************************************************************
*        LOCATION MEASURES         *       DISPERSION MEASURES        *
***********************************************************************
*  MIDRANGE     =  -0.1395000E+03  *  RANGE        =   0.8790000E+03  *
*  MEAN         =  -0.1774350E+03  *  STAND. DEV.  =   0.2773322E+03  *
*  MIDMEAN      =  -0.1797600E+03  *  AV. AB. DEV. =   0.2492250E+03  *
*  MEDIAN       =  -0.1620000E+03  *  MINIMUM      =  -0.5790000E+03  *
*               =                  *  LOWER QUART. =  -0.4510000E+03  *
*               =                  *  LOWER HINGE  =  -0.4530000E+03  *
*               =                  *  UPPER HINGE  =   0.9400000E+02  *
*               =                  *  UPPER QUART. =   0.9300000E+02  *
*               =                  *  MAXIMUM      =   0.3000000E+03  *
***********************************************************************
*       RANDOMNESS MEASURES        *     DISTRIBUTIONAL MEASURES      *
***********************************************************************
*  AUTOCO COEF  =  -0.3073048E+00  *  ST. 3RD MOM. =  -0.5010057E-01  *
*               =   0.0000000E+00  *  ST. 4TH MOM. =   0.1503684E+01  *
*               =   0.0000000E+00  *  ST. WILK-SHA =  -0.1883372E+02  *
*               =                  *  UNIFORM PPCC =   0.9925535E+00  *
*               =                  *  NORMAL  PPCC =   0.9540811E+00  *
*               =                  *  TUK -.5 PPCC =   0.7313794E+00  *
*               =                  *  CAUCHY  PPCC =   0.4408355E+00  *
***********************************************************************
  
Location One way to quantify a change in location over time is to fit a straight line to the data set using the index variable X = 1, 2, ..., N, with N denoting the number of observations. If there is no significant drift in the location, the slope parameter should be zero. For this data set, Dataplot generates the following output:
 LEAST SQUARES MULTILINEAR FIT
       SAMPLE SIZE N       =      200
       NUMBER OF VARIABLES =        1
       NO REPLICATION CASE
  
  
               PARAMETER ESTIMATES           (APPROX. ST. DEV.)    T VALUE
        1  A0                  -178.175       ( 39.47    )       -4.514
        2  A1       X          0.736593E-02   (0.3405    )       0.2163E-01
  
       RESIDUAL    STANDARD DEVIATION =         278.0313
       RESIDUAL    DEGREES OF FREEDOM =         198
The slope parameter, A1, has a t value of 0.022 which is statistically not significant. This indicates that the slope can in fact be considered zero.
Variation One simple way to detect a change in variation is with a Bartlett test after dividing the data set into several equal-sized intervals. However, the Bartlett the non-randomness of this data does not allows us to assume normality, we use the alternative Levene test. In partiuclar, we use the Levene test based on the median rather the mean. The choice of the number of intervals is somewhat arbitrary, although values of 4 or 8 are reasonable. Dataplot generated the following output for the Levene test.
               LEVENE F-TEST FOR SHIFT IN VARIATION
                      (ASSUMPTION: NORMALITY)
  
 1. STATISTICS
       NUMBER OF OBSERVATIONS    =      200
       NUMBER OF GROUPS          =        4
       LEVENE F TEST STATISTIC   =   0.9378599E-01
  
  
    FOR LEVENE TEST STATISTIC
       0          % POINT    =   0.0000000E+00
       50         % POINT    =   0.7914120
       75         % POINT    =    1.380357
       90         % POINT    =    2.111936
       95         % POINT    =    2.650676
       99         % POINT    =    3.883083
       99.9       % POINT    =    5.638597
  
  
          3.659895       % Point:    0.9378599E-01
  
 3. CONCLUSION (AT THE 5% LEVEL):
       THERE IS NO SHIFT IN VARIATION.
       THUS: HOMOGENEOUS WITH RESPECT TO VARIATION.
In this case, the Levene test indicates that the standard deviations are significantly different in the 4 intervals since the test statistic of 13.2 is greater than the 95% critical value of 2.6. Therefore we conclude that the scale is not constant.
Randomness A runs test is used to check for randomness
 
                      RUNS UP
 
           STATISTIC = NUMBER OF RUNS UP
               OF LENGTH EXACTLY I
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1        63.0    104.2083     10.2792       -4.01
   2        34.0     45.7167      5.2996       -2.21
   3        17.0     13.1292      3.2297        1.20
   4         4.0      2.8563      1.6351        0.70
   5         1.0      0.5037      0.7045        0.70
   6         5.0      0.0749      0.2733       18.02
   7         1.0      0.0097      0.0982       10.08
   8         1.0      0.0011      0.0331       30.15
   9         0.0      0.0001      0.0106       -0.01
  10         1.0      0.0000      0.0032      311.40
 
 
           STATISTIC = NUMBER OF RUNS UP
               OF LENGTH I OR MORE
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       127.0    166.5000      6.6546       -5.94
   2        64.0     62.2917      4.4454        0.38
   3        30.0     16.5750      3.4338        3.91
   4        13.0      3.4458      1.7786        5.37
   5         9.0      0.5895      0.7609       11.05
   6         8.0      0.0858      0.2924       27.06
   7         3.0      0.0109      0.1042       28.67
   8         2.0      0.0012      0.0349       57.21
   9         1.0      0.0001      0.0111       90.14
  10         1.0      0.0000      0.0034      298.08
 
 
                     RUNS DOWN
 
           STATISTIC = NUMBER OF RUNS DOWN
               OF LENGTH EXACTLY I
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1        69.0    104.2083     10.2792       -3.43
   2        32.0     45.7167      5.2996       -2.59
   3        11.0     13.1292      3.2297       -0.66
   4         6.0      2.8563      1.6351        1.92
   5         5.0      0.5037      0.7045        6.38
   6         2.0      0.0749      0.2733        7.04
   7         2.0      0.0097      0.0982       20.26
   8         0.0      0.0011      0.0331       -0.03
   9         0.0      0.0001      0.0106       -0.01
  10         0.0      0.0000      0.0032        0.00
 
 
           STATISTIC = NUMBER OF RUNS DOWN
               OF LENGTH I OR MORE
 
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       127.0    166.5000      6.6546       -5.94
   2        58.0     62.2917      4.4454       -0.97
   3        26.0     16.5750      3.4338        2.74
   4        15.0      3.4458      1.7786        6.50
   5         9.0      0.5895      0.7609       11.05
   6         4.0      0.0858      0.2924       13.38
   7         2.0      0.0109      0.1042       19.08
   8         0.0      0.0012      0.0349       -0.03
   9         0.0      0.0001      0.0111       -0.01
  10         0.0      0.0000      0.0034        0.00
 
 
           RUNS TOTAL = RUNS UP + RUNS DOWN
 
         STATISTIC = NUMBER OF RUNS TOTAL
              OF LENGTH EXACTLY I
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       132.0    208.4167     14.5370       -5.26
   2        66.0     91.4333      7.4947       -3.39
   3        28.0     26.2583      4.5674        0.38
   4        10.0      5.7127      2.3123        1.85
   5         6.0      1.0074      0.9963        5.01
   6         7.0      0.1498      0.3866       17.72
   7         3.0      0.0193      0.1389       21.46
   8         1.0      0.0022      0.0468       21.30
   9         0.0      0.0002      0.0150       -0.01
  10         1.0      0.0000      0.0045      220.19
 
 
         STATISTIC = NUMBER OF RUNS TOTAL
               OF LENGTH I OR MORE
 
   I         STAT     EXP(STAT)    SD(STAT)       Z
 
   1       254.0    333.0000      9.4110       -8.39
   2       122.0    124.5833      6.2868       -0.41
   3        56.0     33.1500      4.8561        4.71
   4        28.0      6.8917      2.5154        8.39
   5        18.0      1.1790      1.0761       15.63
   6        12.0      0.1716      0.4136       28.60
   7         5.0      0.0217      0.1474       33.77
   8         2.0      0.0024      0.0494       40.43
   9         1.0      0.0002      0.0157       63.73
  10         1.0      0.0000      0.0047      210.77
 
 
          LENGTH OF THE LONGEST RUN UP         =    10
          LENGTH OF THE LONGEST RUN DOWN       =     7
          LENGTH OF THE LONGEST RUN UP OR DOWN =    10
 
          NUMBER OF POSITIVE DIFFERENCES =   258
          NUMBER OF NEGATIVE DIFFERENCES =   241
          NUMBER OF ZERO     DIFFERENCES =     0
 
Values in the column labeled "Z" greater than 1.96 or less than -1.96 are statistically significant at the 5% level. Numerous values in this column are much larger than +/-1.96, so we conclude that the data are not random.
Distributional Assumptions Since the quantitative tests show that the assumptions of constant scale and non-randomness are not met, the distributional measures will not be meaningful. Therefore these quantitative tests are omitted.
Six Sigma Home Tools & Aids Search Handbook Previous Page Next Page