Introduction

Correlations are used when you are interested in the relationship between two (unsually numerical) variables. The ‘relationship’ is defined as how the variables vary together ie the degree of covariance.

The three types of correlation you will use in basic statistics are

1: Pearson’s- for normally distributed numeric data.

2: Spearman’s- for ordinal data. Non-parametric.

3: Kendall’s correlations -non parametric numeric data.

For this page the input is going to be the amount of sedation given and the time spent doing the endoscopy

 Midazolam<-sample(1:10, 100, replace=TRUE)
 Fentanyl<-sample(1:100, 100, replace=TRUE)
 Age<-sample(1:100, 100, replace=TRUE)
 TimeSpentDoingEndoscopy<-sample(1:50, 100, replace=TRUE)
 EndoCr<-data.frame(Midazolam,Fentanyl,Age,TimeSpentDoingEndoscopy,stringsAsFactors=F)






The correlation functions.

The basic function to do correlation is as follows

cor(x, y, method = c("pearson", "kendall", "spearman"))
cor.test(x, y, method=c("pearson", "kendall", "spearman"))

An example is as follows:

cor.test(EndoCr$Midazolam, EndoCr$Fentanyl,  method = "kendall", use = "complete.obs")
## 
##  Kendall's rank correlation tau
## 
## data:  EndoCr$Midazolam and EndoCr$Fentanyl
## z = -0.62682, p-value = 0.5308
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
##         tau 
## -0.04470526

For Spearman’s rank correlation, the samples have to be ranked first. cor.test does this for you:

mySpearman<-cor.test(x=EndoCr$Midazolam, y=EndoCr$Fentanyl, method = 'spearman')
mySpearman
## 
##  Spearman's rank correlation rho
## 
## data:  EndoCr$Midazolam and EndoCr$Fentanyl
## S = 176500, p-value = 0.5591
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
##         rho 
## -0.05911604

To access the results from cor.test use the variables as you would a data frame. Therefore to get the p-value do:

mySpearman$p.value
## [1] 0.5590593






Visualisation of correlations:

There are many ways to visualise correlations. One useful way is a correlogram using the package ‘corrplot’. The data needs to be converted into a matrix which can be done with the function cor

library(corrplot)
EndoMatrix<-cor(EndoCr)
corrplot(EndoMatrix, method="number")

Another method is as follows using scattergram which compares all numeric data with all numeric data

pairs(EndoCr)

A final nice way is to use the PerformanceAnalytics package which gives scatter, bar and numerical output:

library("PerformanceAnalytics")
chart.Correlation(EndoCr, histogram=TRUE, pch=19)

Then of course we can start to do some interesting things. If we are able to generate correlations of all the numeric columns with each other, we can create a heirarchy of correlations as follows:

cc<-cor(EndoCr,
use="pairwise",
method="pearson")

cc
##                           Midazolam    Fentanyl         Age
## Midazolam                1.00000000 -0.05853632  0.06461162
## Fentanyl                -0.05853632  1.00000000  0.01992456
## Age                      0.06461162  0.01992456  1.00000000
## TimeSpentDoingEndoscopy -0.05574710  0.04810241 -0.04347728
##                         TimeSpentDoingEndoscopy
## Midazolam                           -0.05574710
## Fentanyl                             0.04810241
## Age                                 -0.04347728
## TimeSpentDoingEndoscopy              1.00000000

And from this create a dendogram (albeit a not very complicated one)

hc <- hclust(dist(cc), method="average")
dn<-as.dendrogram(hc)
plot(dn, horiz = TRUE)

Further information can be found here: http://www.sthda.com/english/wiki/visualize-correlation-matrix-using-correlogram#visualization-methods