I like to use R when I need to create a correlation matrix and scatter plot for a large number of variables. For example, this is what I want to create for a data set with insurance variables (click images to enlarge):
Figure 1. Correlation matrix of insurance variables
Figure 2. Scatterplot matrix of insurance variables
Here are the steps I use to create the output shown above:
1. First I need to read in my text file, which contains a header row and 8 columns:
cir<-read.table(“CIR.txt”,header=TRUE)
2. Then I make a couple of changes to the file before I run the correlation matrix and create the scatter plot. I keep columns two through five the same (skipping the first column), but I rename the seventh variable to “involact” and modify income (column eight) so that it is calculated in the thousands:
cir<-data.frame(cir[,2:5],involact=cir[,7],income=cir[,8]/1000)
3. Next I create the correlation matrix and the scatterplot, but I round the numbers to three decimal places to make the output more readable for the correlation matrix. To create the scatterplot, I specify that I want to show the relationships with the target variable involact , and then I list the names of the other variables I want to show:
round(cor(cir),3)
pairs(involact~race+fire+theft+age+volact+income, data=cir)