File Examples
Dr. Jim Watson's 220K indels across the genome [NCBI 36] [NCBI 37]
(Space-based coordinates) [Ref]
Filters down to ~400 coding
chr7 SNPs from the Asian genome [ NCBI 36]
[NCBI 37]
(Residue-based coordinates) [Ref]
Filters down to ~1K coding
Format Example 1: RESIDUE BASED COORDINATE SYSTEM (comma separated, NCBI 36 shown below)
3,81780820,-1,T/C
2,43881517,1,A/T,#User Comment
2,43857514,1,T/C
6,88375602,1,G/A,#User Comment
22,29307353,-1,T/A
10,115912482,-1,C/T
10,115900918,-1,G/T
16,69875502,-1,G/T
16,69876078,-1,T/C
16,69877147,-1,G/A
22,49000825,-1,T/A
22,49000551,-1,T/C
22,49006739,-1,A/G
11,17476318,-1,C/G
4,124033758,-1,C/T
3,185041096,-1,C/G
17,8101874,1,C/T
Format Example 2: SPACE BASED COORDINATE SYSTEM (comma separated)
3,81780819,81780820,-1,T/C
2,43881516,43881517,1,A/T,#User Comment
2,43857513,43857514,1,T/C
6,88375601,88375602,1,G/A,#User Comment
22,29307352,29307353,-1,T/A
10,115912481,115912482,-1,C/T
10,115900917,115900918,-1,G/T
16,69875501,69875502,-1,G/T
16,69876077,69876078,-1,T/C
16,69877146,69877147,-1,G/A
22,49000824,49000825,-1,T/A
22,49000550,49000551,-1,T/C
22,49006738,49006739,-1,A/G
11,17476317,17476318,-1,C/G
4,124033757,124033758,-1,C/T
3,185041095,185041096,-1,C/G
17,8101873,8101874,1,C/T
Format Description
[comma separated: chromosome,coordinate,oientation,alleles,user comment(optional) ]
Please do not use spaces except in the user comments field
Coordinate System:
SIFT accepts both reidue-based and a space-based coordinates for single nucleotide variants.
If there is only one column of coordinates, as shown in Example 1 above, SIFT assumes the coordinate
system is residue-based, if there are two columns, as shown in Example 2 above, SIFT assumes the
coordinate system is space-based.
The space-based coordinate system counts the spaces before and after bases rather than the bases themselves.
Zero always refers to the space before the first base.
The sequence 'ACGT' has coordinates (0,4) and its subsequence 'CG' has coordinates (1,3) as shown in Example 3 below.
The difference between the start and end coordinates gives the sequence length. Misinterpretation of these
coordinates can easily lead to 'off-by-one'. errors. Space-based coordinates become necessary when describing
insertions/deletions and genomic rearrangements.
Example 3:
0 A 1 C 2 G 3 T 4
In a residue based system as described in Example 4 below, each base is assigned a coordinate base on its
absolute position, starting from 1. The sequence 'ACGT' has coordinates (1,4) and its subsequence 'CG' has
coordinates (2,3).
Example 4:
ACGT
1 2 34
Orientation:
Use 1 for positve strand and -1 for negative strand. If orientation is not known, use 1 as default.
Alleles:
Use 'base1/base2' where either base1 or base2 may be the reference allele. SIFT will predict for non-reference
allele only. If you need prediction for reference allele, then use base1/base1 where base1 is the reference allele.