Pages

Friday, November 14, 2014

Scatter Plot Matrices in R

One of our graduate student ask me on how he can check for correlated variables on his dataset. Using R, his problem can be done is three (3) ways. First, he can use the cor function of the stat package to calculate correlation coefficient between variables. Second, he can use functions such as pairs (graphics) to visually check possible correlated variables. Third, he can combine the first two approach following the example of vinux in stackoverflow or using ggpairs function of GGally package.

First Approach

data(iris)
cor(iris[,1:4])
view raw scatterMat1.R hosted with ❤ by GitHub
             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Second Approach

pairs(iris[,1:4])
view raw pairs.R hosted with ❤ by GitHub

























Third Approach
panel.cor <- function(x, y, digits = 2, cex.cor, ...)
{
usr <- par("usr"); on.exit(par(usr))
par(usr = c(0, 1, 0, 1))
# correlation coefficient
r <- cor(x, y)
txt <- format(c(r, 0.123456789), digits = digits)[1]
txt <- paste("r= ", txt, sep = "")
text(0.5, 0.6, txt)
# p-value calculation
p <- cor.test(x, y)$p.value
txt2 <- format(c(p, 0.123456789), digits = digits)[1]
txt2 <- paste("p= ", txt2, sep = "")
if(p<0.01) txt2 <- paste("p= ", "<0.01", sep = "")
text(0.5, 0.4, txt2)
}
pairs(iris, upper.panel = panel.cor)
view raw pairscor.R hosted with ❤ by GitHub

























library(GGally)
ggpairs(iris[,1:4])
view raw ggpairs.R hosted with ❤ by GitHub