Download SCOUT User`s Guide

Transcript
Chapter 14
Statistical Procedures
lead to incorrect remediation decisions.
The MLEs-based classical and even the robust outlier identification procedures are
vulnerable to masking and swamping effects in the presence of multiple outliers. Masking means
that the outliers are hidden, and the presence of some outliers may mask the existence of others.
Even the sequential use of the outlier identification procedures can not help unmask these multiple
outliers (e.g., see Example 1, Chapter 10). When the outliers arise in clusters, the OLS regression
model gets attracted toward the outliers resulting in deflated residuals, leading to masking of
outliers. Swamping, on the other hand, means that some of the inlying observations are identified
as outliers due to the presence of some other outliers. In the presence of multiple outliers, or for
a mixture sample from two or more populations, the generalized distances including robustified Mds
get distorted to such an extent that the cases with large Mds may not correspond to the outlying
observations. This data masking distorts the estimates of the population parameters (e.g.,
)
and the correct ordering of the Mds in an unpredictable manner and often leads to the
misidentification of outliers. The use of approximate distributions of the Mds, such as chi-square
or normal can also lead to the incorrect ordering of the Mds.
It is well known (Huber [1981], Devlin et al. [1981], Hampel et al. [1986], Rousseeuw and
Leroy [1987], Rousseeuw and van Zomeren [1990], and Barnett and Lewis [1994]) that for the
identification of multiple outliers, one should use robust and resistant procedures with a high
breakdown point. Most of the robust outlier identification procedures for the identification of
outliers and the estimation of population parameters of location and scale are iterative, requiring
Scout User's Guide
14-4