|
1.
Exploratory Data Analysis
1.3. EDA Techniques 1.3.5. Quantitative Techniques
|
|||||||||||
|
Purpose: Detection of Outliers |
Grubbs' test
(Grubbs 1969 and
Stefansky 1972)
is used to detect outliers in a univariate data
set. It is based on the assumption of normality. That is,
you should first verify that your data can be reasonably
approximated by a normal distribution before applying the
Grubbs' test.
Grubbs' test detects one outlier at a time. This outlier is expunged from the dataset and the test is iterated until no outliers are detected. However, multiple iterations change the probabilities of detection, and the test should not be used for sample sizes of six or less since it frequently tags most of the points as outliers. Grubbs' test is also known as the maximum normed residual test. |
||||||||||
| Definition |
Grubbs' test is defined for the hypothesis:
|
||||||||||
|
Sample Output |
Dataplot generated the following output for
the ZARR13.DAT data set
showing that Grubbs' test finds no outliers in the dataset:
*********************
** grubbs test y **
*********************
GRUBBS TEST FOR OUTLIERS
(ASSUMPTION: NORMALITY)
1. STATISTICS:
NUMBER OF OBSERVATIONS = 195
MINIMUM = 9.196848
MEAN = 9.261460
MAXIMUM = 9.327973
STANDARD DEVIATION = 0.2278881E-01
GRUBBS TEST STATISTIC = 2.918673
2. PERCENT POINTS OF THE REFERENCE DISTRIBUTION
FOR GRUBBS TEST STATISTIC
0 % POINT = 0.
50 % POINT = 2.984294
75 % POINT = 3.181226
90 % POINT = 3.424672
95 % POINT = 3.597898
99 % POINT = 3.970215
37.59665 % POINT: 2.918673
3. CONCLUSION (AT THE 5% LEVEL):
THERE ARE NO OUTLIERS.
|
||||||||||
| Interpretation of Sample Output |
The output is divided into three sections.
|
||||||||||
| Questions |
Grubbs' test can be used to answer the following questions:
|
||||||||||
| Importance |
Many statistical techniques are sensitive to the presence
of outliers. For example, simple calculations of the mean
and standard deviation may be distorted by a single grossly
inaccurate data point.
Checking for outliers should be a routine part of any data analysis. Potential outliers should be examined to see if they are possibly erroneous. If the data point is in error, it should be corrected if possible and deleted if it is not possible. If there is no reason to believe that the outlying point is in error, it should not be deleted without careful consideration. However, the use of more robust techniques may be warranted. Robust techniques will often downweight the effect of outlying points without deleting them. |
||||||||||
| Related Techniques | Several graphical techniques can, and should, be used to detect outliers. A simple run sequence plot, a box plot, or a histogram should show any obviously outlying points. | ||||||||||
| Case Study | Heat flow meter data. | ||||||||||
| Software | Some general purpose statistical software programs, including Dataplot, support the Grubbs' test. | ||||||||||