For a simple example, I prepared 34 rs-ids in myrsids.txt.
bash-3.2$ head myrsids.txt
rs3121561
rs2799064
rs3128126
rs3766186
rs11721
rs2887286
rs3813200
rs7515488
rs3813199
rs6603781
We can use SnpTracker software (hg19 used) to get the genomic coordinates of SNPs.
bash-3.2$ java -Xmx4g -jar ./snptracker.jar --in myrsids.txt --rsid 1 --ref hg19 --out myrsids.out
@----------------------------------------------------------@
| snpTracker! | v1.0 | 11/Dec/2014 |
|----------------------------------------------------------|
| (C) 2013 Jiaen Deng, dengje@yahoo.com |
| (C) 2013 Miaoxin Li, limx54@yahoo.com |
|----------------------------------------------------------|
| For documentation, citation & bug-report instructions: |
| http://statgenpro.psychiatry.hku.hk/snpTracker |
@----------------------------------------------------------@
Effective settings :
--in myrsids.txt
--out myrsids.out
--rsid 1
--ref hg19
--merge-file b142_GRCh19.RsMergeArch.bcp.gz(Default)
--coor-file b142_SNPChrPosOnRef_GRCh19p105.bcp.gz(Default)
--hist-file b142_GRCh19.SNPHistory.bcp.gz(Default)
To disable web checking, use --no-web
No file(s) need to been updated!
34 SNPs are read from myrsids.txt
Reading SNP history....
Reading merging history of SNPs....
Reading genomic coordinates of SNPs....
Trackering SNPs coordinates....
34 SNPs successfully tracked and mapped are saved in myrsids.out.result.txt
0 SNPs were failed to map in dbSNP, which are saved in myrsids.out.error.txt
The time used is 765 seconds.
Part of output looks like this myrsids.out.result.txt
bash-3.2$ head myrsids.out.result.txt
1 rs3121561 990380 rs3121561
1 rs2799064 957898 rs2799064
1 rs3128126 962210 rs3128126
1 rs3766186 1162435 rs3766186
1 rs11721 1152631 rs11721
1 rs2887286 1156131 rs2887286
1 rs3813200 1151300 rs3813200
1 rs7515488 1163804 rs7515488
1 rs3813199 1158277 rs3813199
1 rs6603781 1158631 rs6603781
We can reorder selected columns using awk command below. myrsids2.txt
bash-3.2$ awk '{printf("%s\t%s\t%s\n", $2, $1, $3); }' myrsids.out.result.txt > myrsids2.txt
bash-3.2$ head myrsids2.txt
rs3121561 1 990380
rs2799064 1 957898
rs3128126 1 962210
rs3766186 1 1162435
rs11721 1 1152631
rs2887286 1 1156131
rs3813200 1 1151300
rs7515488 1 1163804
rs3813199 1 1158277
rs6603781 1 1158631
Now we are ready to map our SNPs to Genes using MAGMA software. For a simple example, we considered the gene database below which is subset of full data, NCBI37.gene.loc, downloadable from MAGMA website
Part of data looks like below. exNCBI37.gene.loc
bash-3.2$ head exNCBI37.gene.loc
375790 1 955503 991492 + AGRN
254173 1 1109286 1133313 + TTLL10
51150 1 1152288 1167447 - SDF4
388581 1 1177833 1182102 - FAM132A
116983 1 1227764 1243269 - ACAP3
126789 1 1243994 1247057 + PUSL1
54973 1 1246965 1260046 - CPSF3L
80772 1 1260143 1264276 + CPTP
83756 1 1266726 1269844 + TAS1R3
1855 1 1270658 1284492 - DVL1
We want to map our SNPs to gene symbol and we want to use 2kb upstream and 2kb downstream. We can do this using awk command below. exNCBI37.2kb.sym
bash-3.2$ awk '{printf("%s\t%s\t%d\t%d\t%s\t%s\n", $1, $2, $3-2000, $4+2000, $5, $6); }' NCBI37.3.gene.loc > NCBI37.2kb.sym
bash-3.2$ cat exNCBI37.2kb.sym
AGRN 1 953503 993492 + 375790
TTLL10 1 1107286 1135313 + 254173
SDF4 1 1150288 1169447 - 51150
FAM132A 1 1175833 1184102 - 388581
ACAP3 1 1225764 1245269 - 116983
PUSL1 1 1241994 1249057 + 126789
CPSF3L 1 1244965 1262046 - 54973
CPTP 1 1258143 1266276 + 80772
TAS1R3 1 1264726 1271844 + 83756
DVL1 1 1268658 1286492 - 1855
MXRA8 1 1286071 1295915 - 54587
AURKAIP1 1 1307110 1312818 - 54998
CCNL2 1 1319091 1336718 - 81669
MRPL20 1 1335276 1344693 - 55052
ANKRD65 1 1351800 1358650 - 441869
ATAD3C 1 1383069 1407538 + 219293
ATAD3B 1 1405164 1433582 + 83858
ATAD3A 1 1445523 1472067 + 55210
TMEM240 1 1468158 1477740 - 339453
SSU72 1 1475053 1512262 - 29101
Run magma software as below. Two inputs are SNP location, myrsids2.txt and gene database, exNCBI37.2kb.sym.
bash-3.2$ /usr/local/bin/magma --annotate --snp-loc myrsids2.txt --gene-loc exNCBI37.2kb.sym --out myrsidout
Welcome to MAGMA v1.02
Using flags:
--annotate
--snp-loc myrsids2.txt
--gene-loc exNCBI37.2kb.sym
--out myrsidout
Start time is 10:29:08, Monday 02 May 2016
Starting annotation...
Reading gene locations from file exNCBI37.2kb.sym...
20 gene locations read from file
chromosome 1: 20 genes
Reading SNP locations from file myrsids2.txt...
34 SNP locations read from file
of those, 34 (100%) mapped to at least one gene
Writing annotation to file myrsidout.genes.annot
for chromosome 1, 16 genes are empty (out of 20)
at least one SNP mapped to each of a total of 4 genes (out of 20)
End time is 10:29:08, Monday 02 May 2016 (elapsed: 00:00:00)
Here’s the result. 3 SNPs are mapped on AGRN, 8 SNPs are mapped on SDF4, 1 SNP is mapped on TMEM240 and 23 SNPs are mapped on SSU72.
bash-3.2$ cat myrsidout.genes.annot
# window_up = 0
# window_down = 0
AGRN 1:953503:993492 rs3121561 rs2799064 rs3128126
SDF4 1:1150288:1169447 rs3766186 rs11721 rs2887286 rs3813200 rs7515488 rs3813199 rs6603781 rs11260562
TMEM240 1:1468158:1477740 rs11807706
SSU72 1:1475053:1512262 rs4970458 rs2031709 rs3128342 rs3766169 rs2296715 rs2296716 rs3118509 rs3766177 rs3766178 rs3766180 rs6603793 rs6656541 rs7366884 rs7519837 rs7520996 rs7531530 rs11260611 rs1887284 rs6603791 rs7517401 rs7540231 rs9439468 rs11807706