D'Agostino's K-squared test
In statistics, D’Agostino’s K^{2} test, named for Ralph D'Agostino, is a goodness-of-fit measure of departure from normality, that is the test aims to establish whether or not the given sample comes from a normally distributed population. The test is based on transformations of the sample kurtosis and skewness, and has power only against the alternatives that the distribution is skewed and/or kurtic.
Contents
Skewness and kurtosis
In the following, { x_{i} } denotes a sample of n observations, g_{1} and g_{2} are the sample skewness and kurtosis, m_{j}’s are the j-th sample central moments, and is the sample mean. Frequently in the literature related to normality testing, the skewness and kurtosis are denoted as √β_{1} and β_{2} respectively. Such notation can be inconvenient since, for example, √β_{1} can be a negative quantity.
The sample skewness and kurtosis are defined as
These quantities consistently estimate the theoretical skewness and kurtosis of the distribution, respectively. Moreover, if the sample indeed comes from a normal population, then the exact finite sample distributions of the skewness and kurtosis can themselves be analysed in terms of their means μ_{1}, variances μ_{2}, skewnesses γ_{1}, and kurtoses γ_{2}. This has been done by Pearson (1931), who derived the following expressions:^{[better source needed]}
and
For example, a sample with size n = 1000 drawn from a normally distributed population can be expected to have a skewness of 0, SD 0.08 and a kurtosis of 0, SD 0.15, where SD indicates the standard deviation.^{[citation needed]}
Transformed sample skewness and kurtosis
The sample skewness g_{1} and kurtosis g_{2} are both asymptotically normal. However, the rate of their convergence to the distribution limit is frustratingly slow, especially for g_{2}. For example even with n = 5000 observations the sample kurtosis g_{2} has both the skewness and the kurtosis of approximately 0.3, which is not negligible. In order to remedy this situation, it has been suggested to transform the quantities g_{1} and g_{2} in a way that makes their distribution as close to standard normal as possible.
In particular, D’Agostino (1970) suggested the following transformation for sample skewness:
where constants α and δ are computed as
and where μ_{2} = μ_{2}(g_{1}) is the variance of g_{1}, and γ_{2} = γ_{2}(g_{1}) is the kurtosis — the expressions given in the previous section.
Similarly, Anscombe & Glynn (1983) suggested a transformation for g_{2}, which works reasonably well for sample sizes of 20 or greater:
where
and μ_{1} = μ_{1}(g_{2}), μ_{2} = μ_{2}(g_{2}), γ_{1} = γ_{1}(g_{2}) are the quantities computed by Pearson.
Omnibus K^{2} statistic
Statistics Z_{1} and Z_{2} can be combined to produce an omnibus test, able to detect deviations from normality due to either skewness or kurtosis (D’Agostino, Belanger & D’Agostino 1990):
If the null hypothesis of normality is true, then K^{2} is approximately χ^{2}-distributed with 2 degrees of freedom.
Note that the statistics g_{1}, g_{2} are not independent, only uncorrelated. Therefore, their transforms Z_{1}, Z_{2} will be dependent also (Shenton & Bowman 1977), rendering the validity of χ^{2} approximation questionable. Simulations show that under the null hypothesis the K^{2} test statistic is characterized by
expected value | standard deviation | 95% quantile | |
---|---|---|---|
n = 20 | 1.971 | 2.339 | 6.373 |
n = 50 | 2.017 | 2.308 | 6.339 |
n = 100 | 2.026 | 2.267 | 6.271 |
n = 250 | 2.012 | 2.174 | 6.129 |
n = 500 | 2.009 | 2.113 | 6.063 |
n = 1000 | 2.000 | 2.062 | 6.038 |
χ^{2}(2) distribution | 2.000 | 2.000 | 5.991 |
See also
References
- Anscombe, F.J.; Glynn, William J. (1983). "Distribution of the kurtosis statistic b_{2} for normal statistics". Biometrika. 70 (1): 227–234. doi:10.1093/biomet/70.1.227. JSTOR 2335960.
- D’Agostino, Ralph B. (1970). "Transformation to normality of the null distribution of g_{1}". Biometrika. 57 (3): 679–681. doi:10.1093/biomet/57.3.679. JSTOR 2334794.
- D’Agostino, Ralph B.; Albert Belanger; Ralph B. D’Agostino, Jr (1990). "A suggestion for using powerful and informative tests of normality" (PDF). The American Statistician. 44 (4): 316–321. doi:10.2307/2684359. JSTOR 2684359. Archived from the original (PDF) on 2012-03-25.
- Pearson, Egon S. (1931). "Note on tests for normality". Biometrika. 22 (3/4): 423–424. doi:10.1093/biomet/22.3-4.423. JSTOR 2332104.
- Shenton, L.R.; Bowman, K.O. (1977). "A bivariate model for the distribution of √b_{1} and b_{2}". Journal of the American Statistical Association. 72 (357): 206–211. doi:10.1080/01621459.1977.10479940. JSTOR 2286939.