364
Influence Measures for Cluster Analysis
Onecia Gibson and Arnold Stromberg
Abstract:
Several useful diagnostic measures have been developed for the detection of influential points in the regression model. Cook and Weisberg suggest that a reasonable measure of influence is found by comparing the volumes of the confidence ellipsoids of the parameter vectors for the full and reduced data sets. In sectino 2, this volume measure is shown to be equivalent to Belsley, Kuh, and Welsch's COVRATIO which measures the change in the estimated covariance matrix of the parameter estimates upon removing the ith observation. The purpose of this paper is to suggest two generalizations of this volume measure for cluster analysis. Both generalizations are formed by first using a multivariate regression model in which each row of the parameter matrix identifies a cluster mean. The development of this regression model is presented in Section 3. In Section 4, the first measure is formed by taking the square of the ratio of the product of the volumes of the confidence ellipsoids for each row of the parameter matrix with the ith observation removed to the product of the volumes of the confidence ellipsoids for each row of the parameter matrix for the full data set. The second measure of influence, which is presented in Section 5, is formed in a similar manner except that the sum of the volumes is used instead of the product. For both measures, bounds are derived which are useful for labelling influential points.
Here is the full postscript text for this
technical report. It is 249031 bytes long.