Correlation – Everything you ever wanted to know but were afraid to ask


Where

n is the sample size

is the measurement for the ith return observation of asset x

is the mean of the return observations of asset x

is the standard deviation of the return observations of asset x

is the measurement for the ith return observation of asset y

is the mean of the return observations of asset y

is the standard deviation of the return observations of asset y

The correlation coefficient is a measure of the strength and direction of a linear relationship between two variables. It can range from -1 to +1 inclusive. The strength is gauged from the absolute magnitude of r, the greater the absolute value of r the greater will be the relationship between the two variables. The direction informs us the way one variable moves in relation to the other. A positive correlation means that as one variable increases the other is also likely to increase. A negative correlation indicates that as one variable increases the other is likely to decrease. An r of -1 or +1 signifies perfectly negative or positive linear correlation respectively. A correlation of zero indicates that the two variables are not related.

Correlation coefficient assumes that the underlying variables have a linear relationship with each other. When the relationship is non-linear then the correlation coefficient could lead to false and misleading results. Correlation could also lead to misleading results when there are outliers in the dataset, when data groups are combined inappropriately, when the data is too homogeneous.

Another important point to note is that a correlation between two variables does not imply causation, i.e. it is not necessarily the case that one variable is causing a response in the other variable. There are other possible interpretations to the observed relationship that must be kept in mind when analyzing results, such as the fact that both variables could be affected by other variables and there may be no direct causation factor between the two variables being analyzed, etc.

It is possible to evaluate the magnitude of the correlation numbers using five “Rules of Thumb” as follows:

 

 

 

 

Range

Interpretation

0<r<0.2

no or negligible correlation

0.2<r<0.4

low degree of correlation

0.4<r<0.6

moderate degree of correlation

0.6<r<0.8

marked degree of correlation

0.8<r<1

high correlation

In order to test whether the correlation is in fact significant rather than a chance occurrence we have used hypothesis testing. Specifically, we are testing the mutually exclusive hypotheses:

Null Hypothesis:

r = 0

Alternative Hypothesis:

r <> 0

 

Using a significance level of 5%, a two tailed test and n-2 degrees of freedom (df) (n is the number of return observations), a critical value is determined from the table below. If the exact degrees of freedom is not available in the table then the critical value at the next lower degrees of freedom will be used. For example if there are 328 observations, degrees of freedom works out to 326. This value is not present in the table and so we will use the critical value at the next lower degrees of freedom, i.e. the critical value at degrees of freedom of 300.

Critical Values

Degrees of Freedom

Level of Significance for a Two-Tailed Test

(n-2)

10%

5%

2%

1%

1

0.988

0.997

0.9995

0.9999

2

0.9

0.95

0.98

0.99

3

0.805

0.878

0.934

0.959

4

0.729

0.811

0.882

0.917

5

0.669

0.754

0.833

0.874

6

0.622

0.707

0.789

0.834

7

0.582

0.666

0.75

0.798

8

0.549

0.632

0.716

0.765

9

0.521

0.602

0.685

0.735

10

0.497

0.576

0.658

0.708

11

0.476

0.553

0.634

0.684

12

0.458

0.532

0.612

0.661

13

0.441

0.514

0.592

0.641

14

0.426

0.497

0.574

0.623

15

0.412

0.482

0.558

0.606

16

0.4

0.468

0.542

0.59

17

0.389

0.456

0.528

0.575

18

0.378

0.444

0.516

0.561

19

0.369

0.433

0.503

0.549

20

0.36

0.423

0.492

0.537

21

0.352

0.413

0.482

0.526

22

0.344

0.404

0.472

0.515

23

0.337

0.396

0.462

0.505

24

0.33

0.388

0.453

0.496

25

0.323

0.381

0.445

0.487

26

0.317

0.374

0.437

0.479

27

0.311

0.367

0.43

0.471

28

0.306

0.361

0.423

0.463

29

0.301

0.355

0.416

0.456

30

0.296

0.349

0.409

0.449

35

0.275

0.325

0.381

0.418

40

0.257

0.304

0.358

0.393

45

0.243

0.288

0.338

0.372

50

0.231

0.273

0.322

0.354

60

0.211

0.25

0.295

0.325

70

0.195

0.232

0.274

0.303

80

0.183

0.217

0.256

0.283

90

0.173

0.205

0.242

0.267

100

0.164

0.195

0.23

0.254

125

  

0.174

  

  

150

  

0.159

  

  

200

  

0.138

  

  

300

  

0.113

  

  

400

  

0.098

  

  

500

  

0.088

  

  

1000

  

0.062

  

  

 

If the calculated correlation is greater than the critical value or less than -1×critical value, it can be concluded that the calculated correlation is not a chance finding but is statistically significant. As a result we reject the null hypothesis and accept the alternative. On the other hand if the calculated correlation is less than the critical value or greater than
-1×critical value, then we will conclude that there is no proof of correlation given the dataset and parameters used.