# Correlation – Everything you ever wanted to know but were afraid to ask

Where

n is the sample size

is the measurement for the ith return observation of asset x

#### is the mean of the return observations of asset x

is the standard deviation of the return observations of asset x

is the measurement for the ith return observation of asset y

#### is the mean of the return observations of asset y

is the standard deviation of the return observations of asset y

The correlation coefficient is a measure of the strength and direction of a linear relationship between two variables. It can range from -1 to +1 inclusive. The strength is gauged from the absolute magnitude of r, the greater the absolute value of r the greater will be the relationship between the two variables. The direction informs us the way one variable moves in relation to the other. A positive correlation means that as one variable increases the other is also likely to increase. A negative correlation indicates that as one variable increases the other is likely to decrease. An r of -1 or +1 signifies perfectly negative or positive linear correlation respectively. A correlation of zero indicates that the two variables are not related.

Correlation coefficient assumes that the underlying variables have a linear relationship with each other. When the relationship is non-linear then the correlation coefficient could lead to false and misleading results. Correlation could also lead to misleading results when there are outliers in the dataset, when data groups are combined inappropriately, when the data is too homogeneous.

Another important point to note is that a correlation between two variables does not imply causation, i.e. it is not necessarily the case that one variable is causing a response in the other variable. There are other possible interpretations to the observed relationship that must be kept in mind when analyzing results, such as the fact that both variables could be affected by other variables and there may be no direct causation factor between the two variables being analyzed, etc.

It is possible to evaluate the magnitude of the correlation numbers using five “Rules of Thumb” as follows:

 Range Interpretation 0

In order to test whether the correlation is in fact significant rather than a chance occurrence we have used hypothesis testing. Specifically, we are testing the mutually exclusive hypotheses:

 Null Hypothesis: r = 0 Alternative Hypothesis: r <> 0

Using a significance level of 5%, a two tailed test and n-2 degrees of freedom (df) (n is the number of return observations), a critical value is determined from the table below. If the exact degrees of freedom is not available in the table then the critical value at the next lower degrees of freedom will be used. For example if there are 328 observations, degrees of freedom works out to 326. This value is not present in the table and so we will use the critical value at the next lower degrees of freedom, i.e. the critical value at degrees of freedom of 300.

 Critical Values Degrees of Freedom Level of Significance for a Two-Tailed Test (n-2) 10% 5% 2% 1% 1 0.988 0.997 0.9995 0.9999 2 0.9 0.95 0.98 0.99 3 0.805 0.878 0.934 0.959 4 0.729 0.811 0.882 0.917 5 0.669 0.754 0.833 0.874 6 0.622 0.707 0.789 0.834 7 0.582 0.666 0.75 0.798 8 0.549 0.632 0.716 0.765 9 0.521 0.602 0.685 0.735 10 0.497 0.576 0.658 0.708 11 0.476 0.553 0.634 0.684 12 0.458 0.532 0.612 0.661 13 0.441 0.514 0.592 0.641 14 0.426 0.497 0.574 0.623 15 0.412 0.482 0.558 0.606 16 0.4 0.468 0.542 0.59 17 0.389 0.456 0.528 0.575 18 0.378 0.444 0.516 0.561 19 0.369 0.433 0.503 0.549 20 0.36 0.423 0.492 0.537 21 0.352 0.413 0.482 0.526 22 0.344 0.404 0.472 0.515 23 0.337 0.396 0.462 0.505 24 0.33 0.388 0.453 0.496 25 0.323 0.381 0.445 0.487 26 0.317 0.374 0.437 0.479 27 0.311 0.367 0.43 0.471 28 0.306 0.361 0.423 0.463 29 0.301 0.355 0.416 0.456 30 0.296 0.349 0.409 0.449 35 0.275 0.325 0.381 0.418 40 0.257 0.304 0.358 0.393 45 0.243 0.288 0.338 0.372 50 0.231 0.273 0.322 0.354 60 0.211 0.25 0.295 0.325 70 0.195 0.232 0.274 0.303 80 0.183 0.217 0.256 0.283 90 0.173 0.205 0.242 0.267 100 0.164 0.195 0.23 0.254 125 0.174 150 0.159 200 0.138 300 0.113 400 0.098 500 0.088 1000 0.062

If the calculated correlation is greater than the critical value or less than -1×critical value, it can be concluded that the calculated correlation is not a chance finding but is statistically significant. As a result we reject the null hypothesis and accept the alternative. On the other hand if the calculated correlation is less than the critical value or greater than
-1×critical value, then we will conclude that there is no proof of correlation given the dataset and parameters used. 