Where
n is the sample size
is the measurement for the ith return observation of asset x
is the mean of the return observations of asset x
is the standard deviation of the return observations of asset x
is the measurement for the ith return observation of asset y
is the mean of the return observations of asset y
is the standard deviation of the return observations of asset y
The correlation coefficient is a measure of the strength and direction of a linear relationship between two variables. It can range from -1 to +1 inclusive. The strength is gauged from the absolute magnitude of r, the greater the absolute value of r the greater will be the relationship between the two variables. The direction informs us the way one variable moves in relation to the other. A positive correlation means that as one variable increases the other is also likely to increase. A negative correlation indicates that as one variable increases the other is likely to decrease. An r of -1 or +1 signifies perfectly negative or positive linear correlation respectively. A correlation of zero indicates that the two variables are not related.
Correlation coefficient assumes that the underlying variables have a linear relationship with each other. When the relationship is non-linear then the correlation coefficient could lead to false and misleading results. Correlation could also lead to misleading results when there are outliers in the dataset, when data groups are combined inappropriately, when the data is too homogeneous.
Another important point to note is that a correlation between two variables does not imply causation, i.e. it is not necessarily the case that one variable is causing a response in the other variable. There are other possible interpretations to the observed relationship that must be kept in mind when analyzing results, such as the fact that both variables could be affected by other variables and there may be no direct causation factor between the two variables being analyzed, etc.
It is possible to evaluate the magnitude of the correlation numbers using five “Rules of Thumb” as follows:
|
|
||||||||||||
In order to test whether the correlation is in fact significant rather than a chance occurrence we have used hypothesis testing. Specifically, we are testing the mutually exclusive hypotheses: |
|
Using a significance level of 5%, a two tailed test and n-2 degrees of freedom (df) (n is the number of return observations), a critical value is determined from the table below. If the exact degrees of freedom is not available in the table then the critical value at the next lower degrees of freedom will be used. For example if there are 328 observations, degrees of freedom works out to 326. This value is not present in the table and so we will use the critical value at the next lower degrees of freedom, i.e. the critical value at degrees of freedom of 300.
Critical Values |
||||
Degrees of Freedom |
Level of Significance for a Two-Tailed Test |
|||
(n-2) |
10% |
5% |
2% |
1% |
1 |
0.988 |
0.997 |
0.9995 |
0.9999 |
2 |
0.9 |
0.95 |
0.98 |
0.99 |
3 |
0.805 |
0.878 |
0.934 |
0.959 |
4 |
0.729 |
0.811 |
0.882 |
0.917 |
5 |
0.669 |
0.754 |
0.833 |
0.874 |
6 |
0.622 |
0.707 |
0.789 |
0.834 |
7 |
0.582 |
0.666 |
0.75 |
0.798 |
8 |
0.549 |
0.632 |
0.716 |
0.765 |
9 |
0.521 |
0.602 |
0.685 |
0.735 |
10 |
0.497 |
0.576 |
0.658 |
0.708 |
11 |
0.476 |
0.553 |
0.634 |
0.684 |
12 |
0.458 |
0.532 |
0.612 |
0.661 |
13 |
0.441 |
0.514 |
0.592 |
0.641 |
14 |
0.426 |
0.497 |
0.574 |
0.623 |
15 |
0.412 |
0.482 |
0.558 |
0.606 |
16 |
0.4 |
0.468 |
0.542 |
0.59 |
17 |
0.389 |
0.456 |
0.528 |
0.575 |
18 |
0.378 |
0.444 |
0.516 |
0.561 |
19 |
0.369 |
0.433 |
0.503 |
0.549 |
20 |
0.36 |
0.423 |
0.492 |
0.537 |
21 |
0.352 |
0.413 |
0.482 |
0.526 |
22 |
0.344 |
0.404 |
0.472 |
0.515 |
23 |
0.337 |
0.396 |
0.462 |
0.505 |
24 |
0.33 |
0.388 |
0.453 |
0.496 |
25 |
0.323 |
0.381 |
0.445 |
0.487 |
26 |
0.317 |
0.374 |
0.437 |
0.479 |
27 |
0.311 |
0.367 |
0.43 |
0.471 |
28 |
0.306 |
0.361 |
0.423 |
0.463 |
29 |
0.301 |
0.355 |
0.416 |
0.456 |
30 |
0.296 |
0.349 |
0.409 |
0.449 |
35 |
0.275 |
0.325 |
0.381 |
0.418 |
40 |
0.257 |
0.304 |
0.358 |
0.393 |
45 |
0.243 |
0.288 |
0.338 |
0.372 |
50 |
0.231 |
0.273 |
0.322 |
0.354 |
60 |
0.211 |
0.25 |
0.295 |
0.325 |
70 |
0.195 |
0.232 |
0.274 |
0.303 |
80 |
0.183 |
0.217 |
0.256 |
0.283 |
90 |
0.173 |
0.205 |
0.242 |
0.267 |
100 |
0.164 |
0.195 |
0.23 |
0.254 |
125 |
|
0.174 |
|
|
150 |
|
0.159 |
|
|
200 |
|
0.138 |
|
|
300 |
|
0.113 |
|
|
400 |
|
0.098 |
|
|
500 |
|
0.088 |
|
|
1000 |
|
0.062 |
|
|
If the calculated correlation is greater than the critical value or less than -1×critical value, it can be concluded that the calculated correlation is not a chance finding but is statistically significant. As a result we reject the null hypothesis and accept the alternative. On the other hand if the calculated correlation is less than the critical value or greater than
-1×critical value, then we will conclude that there is no proof of correlation given the dataset and parameters used.