Friday, November 14, 2014

Scatter Plot Matrices in R

One of our graduate student ask me on how he can check for correlated variables on his dataset. Using R, his problem can be done is three (3) ways. First, he can use the cor function of the stat package to calculate correlation coefficient between variables. Second, he can use functions such as pairs (graphics) to visually check possible correlated variables. Third, he can combine the first two approach following the example of vinux in stackoverflow or using ggpairs function of GGally package.

First Approach

             Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length    1.0000000  -0.1175698    0.8717538   0.8179411
Sepal.Width    -0.1175698   1.0000000   -0.4284401  -0.3661259
Petal.Length    0.8717538  -0.4284401    1.0000000   0.9628654
Petal.Width     0.8179411  -0.3661259    0.9628654   1.0000000

Second Approach

Third Approach



  1. This comment has been removed by the author.

  2. Another way to look at correlation is with correlograms. An overview is here:

    corrgram(iris, upper.panel=panel.pts, lower.panel=panel.ellipse, diag.panel=panel.density)

  3. This comment has been removed by the author.

  4. Hello, you show us three great approaches for correlations, thanks! I wonder about two optional things.

    1) In third approach, is there a possible set up which marks all significant correlations with * / ** / ***, depending on the given significance niveau?

    2) (General question) Does it make sense to add a regression line into each correlation diagram, and if yes (specific question), how can this be done best way (e.g. in solution 3)?

  5. Is there is any ways to convert scatterplot into plotly interactive graph.

    i am taliking about pair() function approach for scatter plot