For a simple example, I prepared 23 SNPs mapped on gene SSU72, SSU72.txt

bash-3.2$  head SSU72.txt
rs4970458
rs2031709
rs3128342
rs3766169
rs2296715
rs2296716
rs3118509
rs3766177
rs3766178
rs3766180

I also prepared small subset of 1000 human genome project data which contains 33 SNPs for 379 subjects. Assume this is our reference panel data. SubsetCh1.txt

We can take sub data for SSU72 using awk command below

bash-3.2$ awk 'NR==FNR {h[$1] = $0; next} h[$1] !="" {print h[$1]}' SubsetCh1.txt SSU72.txt > SubsetSSU72.txt

Among 23 SNPs in SSU72, 22 SNPs were in 1000 human genome panel and sub data saved in SubsetSSU72.txt.

The reason I do this in bash command is that it is simple and fast. reading 1000 human genome project data in R and mapping it in R takes a lot of time. On the other hand, it can be done in few seconds using bash.

Now, we can use R to estimate correlation among SNPs.

locdat <- read.table(file="SubsetSSU72.txt")
datn <- t(matrix(as.numeric(as.matrix(locdat[,3:381])), dim(locdat)[1],379))
colnames(datn) <- locdat[,1]
dim(datn)
## [1] 379  22

We also need to save minor alleles used in 0,1,2 coding because we have to alter 0 and 2 if minor alleles used are different.

usedallele <- as.character(locdat[,2])
usedallele
## [1] "G" "C" "C" "A" "C" "C" "T" "T" "T" "T" "C" "T" "C" "T" "C" "C" "G" "A" "G"
## [20] "G" "A" "A"

Check whether there are SNPs with all 0s.

max(apply( datn == 0, 2, sum )) == 379
## [1] TRUE
 if( max(apply( datn == 0, 2, sum )) == 379 ) {
             er_list = 0
             er_list <- c(er_list, which( apply( datn == 0, 2, sum ) == 379 ) )

             datn <- datn[, -er_list]
             usedallele <- usedallele[-er_list]
}
dim(datn)
## [1] 379  21

One SNP errased.

Correlation among SNPs can be earned by

corSNP <- cor(as.matrix(datn), use="pairwise.complete.obs")

You maybe want to further delete highly correlated SNPs, greater than .95 and less than -.95

while( sum( corSNP > 0.95 ) > ncol(corSNP) ) {
    toerase <- which( apply( corSNP > 0.95, 1, sum) == max( apply( corSNP > 0.95, 1, sum) ) )
    corSNP <- corSNP[-toerase[1], -toerase[1]]
    if( is.matrix(corSNP) == FALSE) break
}

while( sum( corSNP < -0.95 ) > 0 ) {
    toerase <- which( apply( corSNP < -.95, 1, sum) == max( apply( corSNP < -.95, 1, sum) ) )
    corSNP <- corSNP[-toerase[1], -toerase[1]]
    if( is.matrix(corhap) == FALSE) break
}

Which gives you 12 by 12 correlation matrix among SNPs mapped on SSU72

> corSNP
##              rs4970458   rs2031709   rs3128342   rs2296716   rs3118509
## rs4970458   1.00000000 -0.04084146 -0.11782642 -0.03750708 -0.05942565
## rs2031709  -0.04084146  1.00000000  0.14012201 -0.06018247  0.03567443
## rs3128342  -0.11782642  0.14012201  1.00000000  0.46098632  0.14086027
## rs2296716  -0.03750708 -0.06018247  0.46098632  1.00000000  0.08782511
## rs3118509  -0.05942565  0.03567443  0.14086027  0.08782511  1.00000000
## rs3766180  -0.08970164  0.23626920  0.74324439  0.55738899  0.08025057
## rs7519837   0.17811089  0.21915775  0.57968667  0.41502048  0.06321917
## rs7531530   0.21823251  0.02542996  0.52629416  0.60316415  0.09660614
## rs11260611 -0.02390300  0.58526295  0.04830153 -0.06766329  0.02087892
## rs1887284  -0.03825834  0.93675229  0.12558069 -0.04945293  0.03341810
## rs7540231   0.04403638  0.04046505  0.35820590  0.19971246  0.28462121
## rs9439468   0.18605204  0.20408257  0.63447443  0.52033462  0.07004870
##              rs3766180  rs7519837  rs7531530  rs11260611   rs1887284  rs7540231
## rs4970458  -0.08970164 0.17811089 0.21823251 -0.02390300 -0.03825834 0.04403638
## rs2031709   0.23626920 0.21915775 0.02542996  0.58526295  0.93675229 0.04046505
## rs3128342   0.74324439 0.57968667 0.52629416  0.04830153  0.12558069 0.35820590
## rs2296716   0.55738899 0.41502048 0.60316415 -0.06766329 -0.04945293 0.19971246
## rs3118509   0.08025057 0.06321917 0.09660614  0.02087892  0.03341810 0.28462121
## rs3766180   1.00000000 0.83387857 0.74297399  0.07120456  0.19508262 0.25553017
## rs7519837   0.83387857 1.00000000 0.78678281  0.06272010  0.17965214 0.27158556
## rs7531530   0.74297399 0.78678281 1.00000000  0.08539244  0.03009128 0.28301160
## rs11260611  0.07120456 0.06272010 0.08539244  1.00000000  0.62477878 0.02626193
## rs1887284   0.19508262 0.17965214 0.03009128  0.62477878  1.00000000 0.02799795
## rs7540231   0.25553017 0.27158556 0.28301160  0.02626193  0.02799795 1.00000000
## rs9439468   0.90145417 0.93769555 0.86371585  0.05521614  0.16604648 0.27761963
##             rs9439468
## rs4970458  0.18605204
## rs2031709  0.20408257
## rs3128342  0.63447443
## rs2296716  0.52033462
## rs3118509  0.07004870
## rs3766180  0.90145417
## rs7519837  0.93769555
## rs7531530  0.86371585
## rs11260611 0.05521614
## rs1887284  0.16604648
## rs7540231  0.27761963
## rs9439468  1.00000000