All Classes and Interfaces
Class
Description
And expression
Annotate using a VCF "database"
Annotate using a VCF "database"
Note: Reads and loads the whole VCF file into memory
Annotate using a VCF "database"
Note: Assumes that the VCF database file is sorted.
Annotate using a tabix indexed VCF "database"
A "memory efficient" boolean array
Is is implemented as an array of bytes, where each bit is a boolean value.
Count number of heterozygous samples
Count number of homozygous samples
Count number of refernces samples
Count number of ALT samples
A set of DataColumns, indexed by position
This is used to store data for a chromosome
A wrapper for a data column of primitive type T
A Data Column is a column of a specific data tyle (String, Float, Long, etc.) that is stored using primitive types for memory efficiency.
A column of boolean values, that can also be null
The bollean values are stored in a BoolArray
A DataFrame 'row'.
A set of DataColumns specific SNP 'alt' (e.g.
Use a file as a 'marker' database.
DbNSFP database:
Reference https://sites.google.com/site/jpopgen/dbNSFP
DbNSFP database entry:
Reference https://sites.google.com/site/jpopgen/dbNSFP
Added lazy parsing of key/value pairs
Use a VCF file as a database for annotations
A VCF database consists of a VCF file and an index.
Loads a VCF file into memory.
Use an uncompressed sorted VCF file as a database for annotations
Note: Assumes that the VCF database file is sorted and uncompressed.
Use a bgzip-compressed, tabix indexed VCF file as a database for annotations
And expression
Implement a memory efficient array of enums
It only stores bytes (i.e.
Equal
Exists operator (true if a field exists)
A generic expresion
Expressions have values (VcfInfoType)
Binary condition
An expression that can be negated
Get a region from a fasta file
A field:
E.g.: 'DP', 'CHROM'
A 'constant' field: e.g.
A 'constant' field: e.g.
An 'EFF' field form SnpEff:
E.g.: 'EFF[2].GENE'
A field:
E.g.: 'GEN[2].GT'
A field:
E.g.: 'GEN[2].PL[3]'
Iterates on fields / sub-fields
It's a singleton
A LOF field form SnpEff:
E.g.: 'LOF[2].GENE'
A NMD field form SnpEff:
E.g.: 'NMD[2].GENE'
A field that has sub fields (e.g.
A function that returns an expression (i.e.
A function that returns a bool type (i.e.
Greater equal
Greater equal
GWAS catalog table.
Entry from a GWAS-catalog
References:
http://www.genome.gov/gwastudies/#download
http://www.genome.gov/Pages/About/OD/OPG/GWAS%20Catalog/Tab_delimited_column_descriptions_09_27.pdf
Iterate on each line of a GWAS catalog (TXT, tab separated format)
Equal
Is an expression in a set?
An individual in the pedigree
Individuals are like TfamEntries but have drawing info (coordinates, color, etc.)
Is 'genotypeNum' heterozygous?
Is 'genotypeNum' homozygous?
Is 'genotypeNum' reference?
Is 'genotypeNum' reference?
Creates objects from an AST
Less or equal than
Greater equal
Represents a marker in a file (located at 'fileIdx' bytes since the beginning of the file)
Match a regular expression (string)
And expression
And expression
Exists operator (true if a field exists)
Not equal
Not expression
Match a regular expression (string)
Or expression
Draws a pedigree using SVG
Conservation score
And expression
An index by possition
A database query and a result.
Generic SnpSift tool caller
This class provides an empty implementation of
SnpSiftListener
,
which can be extended to create a listener which only needs to handle a subset
of the available methods.This class provides an empty implementation of
SnpSiftVisitor
,
which can be extended to create a visitor which only needs to handle a subset
of the available methods.Convert VCf file to allele matrix
Note: Only use SNPs
Note: Only variants with two possible alleles.
Annotate a VCF file with ID from another VCF file (database)
Annotate a VCF file from another VCF file (database)
The database file is loaded into memory.
Count number of cases and controls
Summarize a VCF annotated file
Calculate genotyping concordance between two VCF files.
Convert allele 'matrix' file into Covariance matrix
Note: Only variants with two possible alleles.
Annotate a VCF file with dbNSFP.
Extract fields from VCF file to a TXT (tab separated) format
Generic SnpSift filter
Filter out data based on VCF attributes:
- Chromosome, Position, etc.
Filter using CHROM:POS only
Generic SnpSift genotype filter
Removes genotypes matching the filter:
e.g.
Annotate a VCF file using Gene sets (MSigDb) or gene ontology (GO)
Add genotype information to INFO fields
Annotate a VCF file using GWAS catalog database
Loads GWAS catalog in memory, thus it makes no assumption about order.
Calculate Hardy-Weinberg equilibrium and goodness of fit for each entry in a VCF file
Intersect intervals
Filter variants that hit intervals
Filter variants that hit intervals
Use an indexed VCF file.
Annotate a VCF file with ID from another VCF file (database)
Draws a pedigree using SVG according to a VCF file
Annotate using PhastCons score files
Annotate if a variant is 'private'.
Removes reference genotypes.
Removes INFO fields
Sort VCF file/s by chromosome invalid input: '&' position
Split a large VCF file by chromosome or bby number of lines
Calculate Ts/Tv rations per sample (transitions vs transversions)
Annotate a VCF file with variant type
Transform a VCF to a TPED file
Check VCF files (run some simple checks)
Annotate a field based on an operation (max, min, etc.) of other VCF fields
This interface defines a complete listener for a parse tree produced by
SnpSiftParser
.This interface defines a complete generic visitor for a parse tree produced
by
SnpSiftParser
.This is a class that reads a VCF file and returns the variants in sorted order.
Implement a memory efficient array of strings
It only stores bytes (i.e.
Implement a memory efficient array of strings
It only stores bytes (i.e.
Summarize a VCF annotated file
And expression
A counnter for genotypes
A database of variant's data used to annotate a VCF file (i.e.
A DataFrame of variant's data that is indexed "variant type AND chromosome possition".
Count different types of variants
These statistics are used to create data sets
Variant type counters for each chromosome
Calculate Hardy-Weimberg equilibrium and goodness of fit.
An index for a VCF file
Represents a set of VCF entries stored in an (uncompressed) file
All entries belong to the same chromosome
Interval tree structure for an 'VcfIndexChromo'
The whole tree is stored in a single class as a set of arrays.
Calculate Linkage Disequilibrium
Reference: "Principles of population genetics (4th edition)" Hartl invalid input: '&' Clark, pages 73 to 81
Note: I try to follow the same notation as the book.
Or expression
Test: This class loaads a "database" VCF file and then annotates another VCF file.