Talk:Distance correlation
WikiProject Statistics  (Rated Startclass, Midimportance)  


WikiProject Mathematics  (Rated Startclass, Lowimportance)  


This is the talk page for discussing improvements to the Distance correlation article. This is not a forum for general discussion of the article's subject. 


Article policies

Contents
Problems with the article
I've gone back to the two cited articles by the original authors, and I have some problems relating some things here to there:
1. In "properties": "(ii) dcov_{n} = 0 if and only if every observation is the same." First, the same as what?: dcov refers to the dcov of two different variables. And this quote seems unlikely to be true, since it would preclude two nonconstant variables from having a sample dcov of zero even if the variables are independent.
 (1) It is correct but unclear.
The 2007 paper (page 1244, before Remark 2) says that dvar_{n}(X) = 0 iff every sample observation is identical. Am I correct that "dcov_{n}" in the present quote should be changed to "dvar_{n}"?
 It would be correct that way. It is awkward to state (ii), better dvar_{n}.
2. In the section "Definitions#Distance covariance", it says "distance covariance is not the same as the covariance of distances, cov(XY, YY’)". Should this say "cov(XX’, YY’)"? As it is it's not symmetric.
 (2) you are correct; cov(XX’, YY’)
3. Still in the section "Definitions#Distance covariance", it says
 "The population value of distance covariance [1][2] is
 dcov(X,Y):= EXX’YY’ + EX – X’ EY – Y’  EX – X’Y – Y”  EX – X”Y – Y’
 where E denotes expected value, X’ is an independent and identically distributed copy of X, Y’ is an independent and identically distributed copy of Y, finally X” (Y”) has the same distribution as X (Y) and independent not only of X (Y) but also of Y and Y’ (X and X’)."
I have a couple problems with this:
 (a) Should it say that " X” (Y”) is independent not only of X and X’ (Y and Y’) but also ...."?
 (b) I can't see how this definition relates to the one in the original papers (2007 and 2009). E.g. the closest thing I can find in the 2007 paper is in regard to the sample dcov, which is given (p. 2776, top and eq. 2.18) as
dcov_{n}^{2} = (1/n^{2}) (summation over k,l = 1 to n) X_{k}  X_{l} Y_{k}  Y_{l} + (1/n^{2}) (summation over k,l = 1 to n) X_{k}  X_{l}× (1/n^{2}) (summation over k,l = 1 to n) Y_{k}  Y_{l} − 2[(1/n^{3}) (summation over k = 1 to n) (summation over l,m = 1 to n) X_{k}  X_{l} Y_{k}  Y_{m}]. This appears to me to translate into an expression for the population dcov^{2} (not dcov) = EX_{k}  X_{l} Y_{k}  Y_{l} + EX_{k}  X_{l} × EY_{k}  Y_{l}− 2[EX_{k}  X_{l} Y_{k}  Y_{m}. (I assume we can translate notation as X_{k} and X_{l} becoming X and X', and Y_{k}, Y_{l} and Y_{m} becoming Y, Y’, and Y”.)
So I don't even see any mention of X” in the original paper. Duoduoduo (talk) 22:23, 21 December 2010 (UTC)
 You want to check the later paper on Brownian Distance Covariance; this result is proved in the second part. You are correct that the equality is stated for population distance covariance. Looks like this section requires clarification.
Notational confusion
@Mathstat: Thanks for trying to clean up this article's notation. Maybe I'm just confused, but I think the difficulty arises in that the original 2007 and 2009 papers use two different meanings for dCov. The 2007 paper says on p. 2772: "The distance covariance (dCov) between random vectors X and Y with finite first moments is the nonnegative number V(X, Y ) defined by V^{2}(X, Y ) = ...." Likewise, the 2009 paper says on pages 12367 "the distance covariance (dCov) statistic, derived in the next section, is the square root of V^{2} ...."
But then for a while in the 2009 paper they use a different definition of dCov: on p. 1238 it says "This new notion Cov_{U}(X, Y ) contains as distinct special cases distance covariance V^{2}(X, Y )...." Six lines later it says "A surprising result develops: the Brownian covariance is equal to the distance covariance" and later in that paragraph it says "we arrive at Cov_{W}(X, Y ) = V^{2}(X, Y )." But then on p. 1241 it says "The distance covariance (dCov) between random vectors X and Y with finite first moments is the nonnegative number V(X, Y ) defined by V^{2}(X, Y ) = ...", which appears to have been cut and pasted from the above quote in the 2007 paper. On p. 1249 it says "the Brownian covariance of X and Y is defined by W^{2}(X, Y ) = ...", but it appears to mean that it is defined as the square root of this. Then on p. 1250 it says "The surprising coincidence: W = V" implying that both dCov and Brownian covariance are the positive square roots of V^{2} and W^{2}.
So I'm confused. I hope you're able to sort all this out so as to use a consistent notation in the Wikipedia article. Duoduoduo (talk) 18:27, 4 February 2011 (UTC)
 Yes, as you noticed, the notation in this Wikipedia article was not quite consistent with the notation in the 2007 and 2009 papers, and these recent changes are mainly to be consistent in notation. Concerning other notational matters in the Brownian covariance part, in SR2009 pp. 12481249 the Brownian covariance is defined in (3.4) and (3.6). In (3.4) it is stated that Brownian covariance is defined by its square W^{2}(X, Y ), which parallels the definition of distance covariance in both papers. In (3.6) "Brownian covariance is defined by ... (equation 3.6 with W^{2}(X, Y )). When reading the two pages it makes sense, but on p. 1249 it would be more clear if it said "Brownian covariance W" is defined by ..." or "is defined as the square root of ..." as you wrote here. Your sentence "The surprising coincidence: W = V" implying that both dCov and Brownian covariance are the positive square roots of V^{2} and W^{2}. summarizes it well. Mathstat (talk) 19:28, 4 February 2011 (UTC)
Edits to Definitions, and miscellaneous
 Sorry I made major edits without posting here! I'll do so in the future. This article is great, and just thinking of ways to improve it!
 I think Definitions need to be edited for a more layperson audience (i.e., nontheoretical statisticians). Presumably, most readers are familiar with statistics, and want to know (1) intuition behind distance covariance and (2) how to compute it. The current article has rather obscure notation (granted, taken from Szekely and Rizzo, 2009), but perhaps using "D" to denote distance matrix and "R" to denote recentered distance matrices are more readerfriendly. Also, defining dCov^2 with the equation below "One can show that this is equivalent to the following definition:" is stated without any intuition. This should be put into a later section for readers who want to more details about dcov (i.e., this equation is derived from starting with a norm difference between distributions). My edits (drbabinski) try to clean up the notation, and make things more straightforward for a layperson reader (although much can be improved), without removing the previous definitions.
 The picture with the different data sets and a dcorr value is misleading. It is unclear how to interpret dcorr values, and saying a relationship has a larger dcorr than another relationship should be carefully interpreted based on the number of samples and variables. This differs from Pearson correlation, whose value is interpretable.
 Can the "Problems with the article" section below (in Talk) be archived? Are those problems resolved?
Drbabinski (talk) 18:37, 24 January 2018 (UTC)