Mapping Snp to Gene

For a simple example, I prepared 34 rs-ids in myrsids.txt.

bash-3.2$ head myrsids.txt
rs3121561
rs2799064
rs3128126
rs3766186
rs11721
rs2887286
rs3813200
rs7515488
rs3813199
rs6603781

We can use SnpTracker software (hg19 used) to get the genomic coordinates of SNPs.

bash-3.2$ java -Xmx4g -jar ./snptracker.jar --in myrsids.txt --rsid 1 --ref hg19 --out myrsids.out
@----------------------------------------------------------@
|     snpTracker!     |     v1.0      |   11/Dec/2014    |
|----------------------------------------------------------|
|  (C) 2013 Jiaen Deng,  dengje@yahoo.com                  |
|  (C) 2013 Miaoxin Li,  limx54@yahoo.com                  |
|----------------------------------------------------------|
|  For documentation, citation & bug-report instructions:  |
|      http://statgenpro.psychiatry.hku.hk/snpTracker      |
@----------------------------------------------------------@
Effective settings :
--in myrsids.txt
--out myrsids.out
--rsid 1
--ref hg19
--merge-file b142_GRCh19.RsMergeArch.bcp.gz(Default)
--coor-file b142_SNPChrPosOnRef_GRCh19p105.bcp.gz(Default)
--hist-file b142_GRCh19.SNPHistory.bcp.gz(Default)

To disable web checking, use --no-web
No file(s) need to been updated!
34 SNPs are read from myrsids.txt
Reading SNP history....
Reading merging history of SNPs....
Reading genomic coordinates of SNPs....
Trackering SNPs coordinates....
34 SNPs successfully tracked and mapped are saved in myrsids.out.result.txt
0 SNPs were failed to map in dbSNP, which are saved in myrsids.out.error.txt

The time used is 765 seconds.

Part of output looks like this myrsids.out.result.txt

bash-3.2$ head myrsids.out.result.txt
1   rs3121561   990380  rs3121561
1   rs2799064   957898  rs2799064
1   rs3128126   962210  rs3128126
1   rs3766186   1162435 rs3766186
1   rs11721 1152631 rs11721
1   rs2887286   1156131 rs2887286
1   rs3813200   1151300 rs3813200
1   rs7515488   1163804 rs7515488
1   rs3813199   1158277 rs3813199
1   rs6603781   1158631 rs6603781

We can reorder selected columns using awk command below. myrsids2.txt

bash-3.2$ awk '{printf("%s\t%s\t%s\n", $2, $1, $3); }' myrsids.out.result.txt > myrsids2.txt
bash-3.2$ head myrsids2.txt
rs3121561   1   990380
rs2799064   1   957898
rs3128126   1   962210
rs3766186   1   1162435
rs11721 1   1152631
rs2887286   1   1156131
rs3813200   1   1151300
rs7515488   1   1163804
rs3813199   1   1158277
rs6603781   1   1158631

Now we are ready to map our SNPs to Genes using MAGMA software. For a simple example, we considered the gene database below which is subset of full data, NCBI37.gene.loc, downloadable from MAGMA website

Part of data looks like below. exNCBI37.gene.loc

bash-3.2$ head exNCBI37.gene.loc
375790  1       955503  991492  +       AGRN
254173  1       1109286 1133313 +       TTLL10
51150   1       1152288 1167447 -       SDF4
388581  1       1177833 1182102 -       FAM132A
116983  1       1227764 1243269 -       ACAP3
126789  1       1243994 1247057 +       PUSL1
54973   1       1246965 1260046 -       CPSF3L
80772   1       1260143 1264276 +       CPTP
83756   1       1266726 1269844 +       TAS1R3
1855    1       1270658 1284492 -       DVL1

We want to map our SNPs to gene symbol and we want to use 2kb upstream and 2kb downstream. We can do this using awk command below. exNCBI37.2kb.sym

bash-3.2$ awk '{printf("%s\t%s\t%d\t%d\t%s\t%s\n", $1, $2, $3-2000, $4+2000, $5, $6); }' NCBI37.3.gene.loc > NCBI37.2kb.sym
bash-3.2$ cat exNCBI37.2kb.sym
AGRN    1       953503  993492  +       375790
TTLL10  1       1107286 1135313 +       254173
SDF4    1       1150288 1169447 -       51150
FAM132A 1       1175833 1184102 -       388581
ACAP3   1       1225764 1245269 -       116983
PUSL1   1       1241994 1249057 +       126789
CPSF3L  1       1244965 1262046 -       54973
CPTP    1       1258143 1266276 +       80772
TAS1R3  1       1264726 1271844 +       83756
DVL1    1       1268658 1286492 -       1855
MXRA8   1       1286071 1295915 -       54587
AURKAIP1        1       1307110 1312818 -       54998
CCNL2   1       1319091 1336718 -       81669
MRPL20  1       1335276 1344693 -       55052
ANKRD65 1       1351800 1358650 -       441869
ATAD3C  1       1383069 1407538 +       219293
ATAD3B  1       1405164 1433582 +       83858
ATAD3A  1       1445523 1472067 +       55210
TMEM240 1       1468158 1477740 -       339453
SSU72   1       1475053 1512262 -       29101

Run magma software as below. Two inputs are SNP location, myrsids2.txt and gene database, exNCBI37.2kb.sym.

bash-3.2$ /usr/local/bin/magma --annotate --snp-loc myrsids2.txt --gene-loc exNCBI37.2kb.sym --out myrsidout
Welcome to MAGMA v1.02
Using flags:
        --annotate
        --snp-loc myrsids2.txt
        --gene-loc exNCBI37.2kb.sym
        --out myrsidout

Start time is 10:29:08, Monday 02 May 2016

Starting annotation...
Reading gene locations from file exNCBI37.2kb.sym...
        20 gene locations read from file
        chromosome  1: 20 genes
Reading SNP locations from file myrsids2.txt...
        34 SNP locations read from file
        of those, 34 (100%) mapped to at least one gene
Writing annotation to file myrsidout.genes.annot
        for chromosome  1, 16 genes are empty (out of 20)
        at least one SNP mapped to each of a total of 4 genes (out of 20)


End time is 10:29:08, Monday 02 May 2016 (elapsed: 00:00:00)

Here’s the result. 3 SNPs are mapped on AGRN, 8 SNPs are mapped on SDF4, 1 SNP is mapped on TMEM240 and 23 SNPs are mapped on SSU72.

bash-3.2$ cat myrsidout.genes.annot
# window_up = 0
# window_down = 0
AGRN    1:953503:993492 rs3121561       rs2799064       rs3128126
SDF4    1:1150288:1169447       rs3766186       rs11721 rs2887286       rs3813200       rs7515488       rs3813199       rs6603781       rs11260562
TMEM240 1:1468158:1477740       rs11807706
SSU72   1:1475053:1512262       rs4970458       rs2031709       rs3128342       rs3766169       rs2296715       rs2296716       rs3118509       rs3766177       rs3766178       rs3766180       rs6603793       rs6656541       rs7366884       rs7519837       rs7520996       rs7531530       rs11260611      rs1887284       rs6603791       rs7517401       rs7540231       rs9439468       rs11807706

Mapping Snp to Gene

Il-Youp Kwak

2016-5-2