All Classes and Interfaces

Class
Description
Abstract class that coordinates the general task of taking in a set of alignment information, possibly in SAM format, possibly in other formats, and merging that with the set of all reads for which alignment was attempted, stored in an unmapped SAM file.
 
The position files of Illumina are nearly the same form: Pos files consist of text based tabbed x-y coordinate float pairs, locs files are binary x-y float pairs, clocs are compressed binary x-y float pairs.
Class for parsing text files where each line consists of fields separated by whitespace.
Abstract class that holds parameters and methods common to classes that perform duplicate detection and/or marking within SAM/BAM/CRAM files.
Little class used to package up a header and an iterable/iterator.
Abstract class that holds parameters and methods common to classes that optical duplicate detection.
AbstractWgsMetricsCollector<T extends htsjdk.samtools.util.AbstractRecordAndOffset>
Class for collecting data on reference coverage, base qualities and excluded bases from one AbstractLocusInfo object for CollectWgsMetrics.
Combines multiple Picard QualityYieldMetrics files into a single file.
Combines multiple Variant Calling Metrics files into a single file.
Store one or more AdapterPairs to use to mark adapter sequence of SAMRecords.
 
A utility class for matching reads to adapters.
A tool to add comments to a BAM file header.
 
Assigns all the reads in a file to a single new read-group.
High level metrics about the alignment of reads within a SAM file, produced by the CollectAlignmentSummaryMetrics program and usually stored in a file with the extension ".alignment_summary_metrics".
 
 
Filters out a record if the allele balance for heterozygotes is out of a defined range across all samples.
Utilities class containing methods for restricting VariantContext and GenotypesContext objects to a reduced set of alleles, as well as for choosing the best set of alleles to keep and for cleaning up annotations and genotypes after subsetting.
Exception thrown when loading gene annotations.
A simple class to store names and counts for the the Control Information fields that are stored in an Illumina GTC file.
Wrapper around a CloseableIterator that reads in a separate thread, for cases in which that might be efficient.
Describes
 
Designs baits for hybrid selection!
Set of possible design strategies for bait design.
Command line program to print statistics from BAM index (.bai) file Statistics include count of aligned and unaligned reads for each reference sequence and a count of all records with no start coordinate.
Deprecated.
A class for finding the distance between multiple (matched) barcodes and multiple barcode reads.
BarcodeExtractor is used to match barcodes and collect barcode match metrics.
Utility class to hang onto data about the best match for a given barcode
Created by jcarey on 3/13/14.
Reads a single barcode file line by line and returns the barcode if there was a match or NULL otherwise.
Metrics produced by the ExtractIlluminaBarcodes program that is used to parse data in the basecalls directory and determine to which barcode each read should be assigned.
 
An interface that can take a collection of bases (provided as SamLocusIterator.RecordAndOffset and SamLocusAndReferenceIterator.SAMLocusAndReference) and generates a ErrorMetric from them.
Tools that process sequencing machine data, e.g.
BasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data from standard Illumina formats to specific output records (FASTA records/SAM records).
Interface that defines a converter that takes ClusterData and returns OUTPUT_RECORD type objects.
Interface that defines a writer that will write out OUTPUT_RECORD type objects.
BasecallsConverterBuilder creates and configures BasecallsConverter objects.
 
An interface and implementations for classes that apply a RecordAndOffsetStratifier to put bases into various "bins" and then compute an ErrorMetric on these bases using a BaseErrorCalculator.
 
An error metric for the errors in bases.
Parse various formats and versions of Illumina Basecall files, and use them the to populate ClusterData objects.
TextFileParser which reads a single text file.
Created by jcarey on 3/14/14.
A class that implements the IlluminaData interfaces provided by this parser One BclData object is returned to IlluminaDataProvider per cluster and each first level array in bases and qualities represents a single read in that cluster
 
 
Annoyingly, there are two different files with extension .bci in NextSeq output.
Describes a mechanism for revising and evaluating qualities read from a BCL file.
BCL Files are base call and quality score binary files containing a (base,quality) pair for successive clusters.
 
For an aligner that aligns each end independently, select the alignment for each end with the best MAPQ, and make that the primary.
This strategy was designed for TopHat output, but could be of general utility.
A simple program to convert an Illumina bpm (bead pool manifest file) into a normalization manifest (bpm.csv) file The normalization manifest (bpm.csv) is a simple text file generated by Illumina tools - it has a specific format and is used by ZCall .
A class to represent an 'Extended' Illumina Manifest file.
A class to represent a record (line) from an Extended Illumina Manifest [Assay] entry
 
 
Command line program to generate a BAM index (.bai) file from a BAM (.bam) file
Takes a VCFFileReader and an IntervalList and provides a single iterator over all variants in all the intervals.
Calculates various metrics on a sample fingerprint, indicating whether the fingerprint satisfies the assumptions we have.
 
Collects variants and generates metrics about them.
 
 
A read name encoder conforming to the standard described by Illumina Casava 1.8.
This class provides that data structure for cbcls.
------------------------------------- CBCL Header ----------------------------------- Bytes 0 - 1 Version number, current version is 1 unsigned 16 bits little endian integer Bytes 2 - 5 Header size unsigned 32 bits little endian integer Byte 6 Number of bits per basecall unsigned Byte 7 Number of bits per q-score unsigned
 
 
Checks the sample identity of the sequence/genotype data in the provided file (SAM/BAM or VCF) against a set of known genotypes in the supplied genotype file (in VCF format).
Program to check a lane of an Illumina output directory.
Simple class to check the terminator block of a SAM file.
 
Implementation of a circular byte buffer that uses a large byte[] internally and supports basic read/write operations from/to other byte[]s passed as arguments.
 
Utilities to clip the adapter sequence from a SAMRecord read
 
The clocs file format is one of 3 Illumina formats(pos, locs, and clocs) that stores position data exclusively.
Summary
Store the information from Illumina files for a single cluster with one or more reads.
Takes ClusterData provided by an IlluminaDataProvider into one or two SAMRecords, as appropriate, and optionally marking adapter sequence.
A metric class to hold the result of ClusterCrosscheckMetrics fingerprints.
A command line tool to read a BAM file and produce standard alignment metrics that would be applicable to any alignment.
 
Collects summary and per-sample metrics about variant calls in a VCF file.
 
 
 
 
Collect DuplicateMark'ing metrics from an input file that was already Duplicate-Marked.
Tool to collect information about GC bias in the reads in a given BAM file.
Collect metrics regarding the reason for reads (sequenced by HiSeqX) not passing the Illumina PF Filter.
a metric class for describing FP failing reads from an Illumina HiSeqX lane *
Metrics produced by the GetHiSeqXPFFailMetrics program.
 
 
This tool takes a SAM/BAM file input and collects metrics that are specific for sequence datasets generated through hybrid-selection.
A Command line tool to collect Illumina Basecalling metrics for a sequencing run Requires a Lane and an input file of Barcodes to expect.
Utility for collating Tile records from the Illumina TileMetrics file into lane-level and phasing-level metrics.
A CLP that, given a BAM and a VCF with genotypes of the same sample, estimates the rate of independent replication of reads within the bam.
Command line program to read non-duplicate insert sizes, create a Histogram and report distribution statistics.
Command-line program to compute metrics about outward-facing pairs, inward-facing pairs, and chimeras in a jumping library.
Class that is designed to instantiate and execute multiple metrics programs that extend SinglePassSamProgram while making only a single pass through the SAM file and supplying each program with the records as it goes.
 
 
Class for trying to quantify the CpCG->CpCA error rate.
Metrics class for outputs.
Command line program to calculate quality yield metrics
A set of metrics used to describe the general quality of a BAM file
 
 
Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments, same implementation as CollectWgsMetrics, with different defaults: lacks baseQ and mappingQ filters and has much higher coverage cap.
 
 
Calculates and reports QC metrics for RRBS data based on the methylation status at individual C/G bases as well as CpG sites across all reads in the input BAM/SAM file.
 
Program to collect error metrics on bases stratified in various ways.
Quantify substitution errors caused by mismatched base pairings during various stages of sample / library prep.
Both CollectTargetedPCRMetrics and CollectHsSelection share virtually identical program structures except for the name of their targeting mechanisms (e.g.
This tool calculates a set of PCR-related metrics from an aligned SAM or BAM file containing targeted sequencing data.
Collects summary and per-sample metrics about variant calls in a VCF file.
A collection of metrics relating to snps and indels within a variant-calling file (VCF) for a given sample.
A collection of metrics relating to snps and indels within a variant-calling file (VCF).
Computes a number of metrics that are useful for evaluating coverage and performance of whole genome sequencing experiments.
 
 
 
Metrics for evaluating the performance of whole genome sequencing experiments.
 
A simple program to combine multiple genotyping array VCFs into one VCF The input VCFs must have the same sequence dictionary and same list of variant loci.
Embodies defaults for global values that affect how the Picard Command Line operates.
Abstract class to facilitate writing command-line programs.
Class for handling translation of Picard-style command line argument syntax to POSIX-style argument syntax; used for running tests written with Picard style syntax against the Barclay command line parser.
A simple tool to compare two Illumina GTC files.
Compare two metrics files.
 
Rudimentary SAM comparer.
 
 
Class for managing a list of Counters of integer, provides methods to access data from Counters with respect to an offset.
Counting filter that discards reads are unaligned or aligned with MQ==0 and whose 5' ends look like adapter Sequence
Counting filter that discards reads that have been marked as duplicates.
A SamRecordFilter that counts the number of bases in the reads which it filters out.
Counting filter that discards reads below a configurable mapping quality threshold.
Counting filter that discards reads that are unpaired in sequencing and paired reads whose mates are not mapped.
A simple program to create a standard picard metrics file from the output of bafRegress
Create an Extended Illumina Manifest by performing a liftover to Build 37.
Create a SAM/BAM file from a fasta containing reference sequence.
 
A simple program to create a standard picard metrics file from the output of VerifyIDIntensity
Checks that all data in the set of input files appear to come from the same individual.
A class to hold the result of crosschecking fingerprints.
The data type.
 
Deprecated.
6/6/2017 Use CrosscheckFingerprints instead.
 
Utility class to use with DbSnp files to determine is a locus is a dbSnp site.
Little tuple class to contain one bitset for SNPs and another for Indels.
Iterate through a delimited text file in which columns are found by looking at a header line rather than by position.
Filters out a record if all variant samples have depth lower than the given value.
Tools that collect sequencing quality-related and comparative metrics
A genotype produced by one of the concrete implementations of AbstractAlleleCaller.
Simple enum to represent the three possible combinations of major/major, major/minor and minor/minor haplotypes for a diploid individual.
Disk-based implementation of ReadEndsForMarkDuplicatesMap.
 
Summary
Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords.
Factory class that creates either regular or flow-based duplication metrics.
When it is necessary to pick a primary alignment from a group of alignments for a read, pick the one that maps the earliest base in the read.
 
 
 
Created by farjoun on 6/26/18.
Summary metrics produced by CollectSequencingArtifactMetrics as a roll up of the context-specific error rates, to provide global error rates per type of base substitution.
Attempts to estimate library complexity from sequence alone.
 
Program to create a fingerprint for the contaminating sample when the level of contamination is both known and uniform in the genome.
Determine the barcode for each read in an Illumina lane.
Extracts barcodes and accumulates metrics for an entire tile.
Simple command line program that allows sub-sequences represented by an interval list to be extracted from a reference sequence file.
Converts a FASTQ file to an unaligned BAM or SAM file.
Class represents fast algorithm for collecting data from AbstractLocusInfo with a list of aligned EdgingRecordAndOffset objects.
Summary
 
Iterator that dynamically applies filter strings to VariantContext records supplied by an underlying iterator.
Created by jcarey on 3/13/14.
Illumina uses an algorithm described in "Theory of RTA" that determines whether or not a cluster passes filter("PF") or not.
Summary
 
Applies a set of hard filters to Variants and to Genotypes within a VCF.
Summary
class to represent a genetic fingerprint as a set of HaplotypeProbabilities objects that give the relative probabilities of each of the possible haplotypes at a locus.
Major class that coordinates the activities involved in comparing genetic fingerprint data whether the source is from a genotyping platform or derived from sequence data.
class to hold the details of a element of fingerprinting PU tag
Detailed metrics about an individual SNP/Haplotype comparison within a fingerprint comparison.
Summary fingerprinting metrics and statistics about the comparison of the sequence data from a single read group (lane or index within a lane) vs.
Class for holding metrics on a single fingerprint.
Class that is used to represent the results of comparing a read group within a SAM file, or a sample within a VCF against one or more set of fingerprint genotypes.
A set of utilities used in the fingerprinting environment
A class that holds VariantContexts sorted by genomic position
Filters records based on the phred scaled p-value from the Fisher Strand test stored in the FS attribute.
Summary
Tool for replacing or fixing up a VCF header.
 
The scheme is defined in the constructor.
The default scheme is derived from the GA4GH Benchmarking Work Group's proposed evaluation scheme.
Concatenate efficiently BAM files that resulted from a scattered parallel analysis.
Simple little class that combines multiple VCFs that have exactly the same set of samples and nonoverlapping sets of loci.
Copied from BucketUtils.java in GATK To be replaced once the original GATK BucketUtils.java is ported to htsjdk
 
 
Class that holds detailed metrics about reads that fall within windows of a certain GC bin on the reference genome.
 
Calculates GC Bias Metrics on multiple levels Created by kbergin on 3/23/15.
High level metrics that capture how biased the coverage in a certain lane is.
Utilities to calculate GC Bias Created by kbergin on 9/23/15.
Holds annotation of a gene for storage in an OverlapDetector.
Load gene annotations into an OverlapDetector of Gene objects.
Summary
A simple structure to return the results of getAlleles.
Class that holds metrics about the Genotype Concordance contingency tables.
A class to store the counts for various truth and call state classifications relative to a reference.
Class that holds detail metrics about Genotype Concordance
This defines for each valid TruthState and CallState tuple, the set of contingency table entries that to which the tuple should contribute.
Created by kbergin on 6/19/15.
Created by kbergin on 7/30/15.
A class to store the various classifications for: 1.
These states represent the relationship between the call genotype and the truth genotype relative to a reference sequence.
A specific state for a 2x2 contingency table.
A minute class to store the truth and call state respectively.
These states represent the relationship between a truth genotype and the reference sequence.
Class that holds summary metrics about Genotype Concordance
An interface for classes that perform Genotype filtration.
Genotype filter that filters out genotypes below a given quality threshold.
Miscellaneous tools, e.g.
Created by farjoun on 11/2/16.
 
Class to convert an Illumina GTC file into a VCF file.
An accumulator for collecting metrics about a single-sample GVCF.
Represents information about a group of SNPs that form a haplotype in perfect LD with one another.
A collection of metadata about Haplotype Blocks including multiple in memory "indices" of the data to make it easy to query the correct HaplotypeBlock or Snp by snp names, positions etc.
Abstract class for storing and calculating various likelihoods and probabilities for haplotype alleles given evidence.
Log10(P(evidence| haplotype)) for the 3 different possible haplotypes {aa, ab, bb}
Represents the probability of the underlying haplotype of the contaminating sample given the data.
Represents a set of HaplotypeProbabilities that were derived from a single SNP genotype at a point in time.
Represents the likelihood of the HaplotypeBlock given the GenotypeLikelihoods (GL field from a VCF, which is actually a log10-likelihood) for each of the SNPs in that block.
Represents the probability of the underlying haplotype given the data.
A wrapper class for any HaplotypeProbabilities instance that will assume that the given evidence is that of a tumor sample and provide an hp for the normal sample that tumor came from.
 
Calculates HS metrics for a given SAM or BAM file.
Metrics generated by CollectHsMetrics for the analysis of target-capture sequencing experiments.
Program to create a fingerprint for the contaminating sample when the level of contamination is both known and uniform in the genome.
A class to encompass writing an Illumina adpc.bin file.
 
Metric for Illumina Basecalling that stores means and standard deviations on a per-barcode per-lane basis.
 
Simple switch to control the read name format to emit.
IlluminaBasecallsToSam transforms a lane of Illumina data file formats (bcl, locs, clocs, qseqs, etc.) into SAM, BAM or CRAM file format.
A class to parse the contents of an Illumina Bead Pool Manifest (BPM) file A BPM file contains metadata (including the alleles, mapping and normalization information) on an Illumina Genotyping Array Each type of genotyping array has a specific BPM .
A simple class to represent a locus entry in an Illumina Bead Pool Manifest (BPM) file
IlluminaDataProviderFactory accepts options for parsing Illumina data files for a lane and creates an IlluminaDataProvider, an iterator over the ClusterData for that lane, which utilizes these options.
List of data types of interest when parsing Illumina data.
General utils for dealing with IlluminaFiles as well as utils for specific, support formats.
 
 
Embodies characteristics that describe a lane.
A class to represent an Illumina Manifest file.
A class to represent a record (line) from an Illumina Manifest [Assay] entry
 
Illumina's TileMetricsOut.bin file codes various metrics, both concrete (all density id's are code 100) or as a base code (e.g.
Metrics for Illumina Basecalling that stores median phasing and prephasing percentages on a per-template-read, per-lane basis.
A read name encoder following the encoding initially produced by picard fastq writers.
Misc utilities for working with Illumina specific files and data
Describes adapters used on each pair of strands
A calculator that estimates the error rate of the bases it observes for indels only.
Metric to be used for InDel errors
A class to store information relevant for biological rate estimation
A class to provide methods for accessing Illumina Infinium Data Files.
A class to parse the contents of an Illumina Infinium cluster (EGT) file A cluster file contains information about the clustering information used in mapping red / green intensity information to genotype calls
A class to encapsulate the table of contents for an Illumina Infinium Data Files.
A class to parse the contents of an Illumina Infinium genotype (GTC) file A GTC file is the output of Illumina's genotype calling software (either Autocall or Autoconvert) and contains genotype calls, confidence scores, basecalls and raw intensities for all calls made on the chip.
 
A class to parse the contents of an Illumina Infinium Normalization Manifest file An Illumina Infinium Normalization Manifest file contains a subset of the information contained in the Illumina Manifest file in addition to the normalization ID which is needed for normalizating intensities in GtcToVcf
 
A class to store fields that are specific to a VCF generated from an Illumina GTC file.
 
Metrics about the insert size distribution of a paired-end library, created by the CollectInsertSizeMetrics program and usually written to a file with the extension ".insert_size_metrics".
Collects InsertSizeMetrics on the specified accumulationLevels using
The channels in a FourChannelIntensityData object, and the channels produced by a ClusterIntensityFileReader, for cases in which it is desirable to handle these abstractly rather than having the specific names in the source code.
Base interface for an interval argument collection.
 
An interface for a class that scatters IntervalLists.
a Baseclass for scatterers that scatter by uniqued base count.
Scatters IntervalList by interval count so that resulting IntervalList's have the same number of intervals in them.
Scatters IntervalList by into `interval count` shards so that resulting IntervalList's have approximately same number of intervals in them.
A BaseCount Scatterer that avoid breaking-up intervals.
Like IntervalListScattererWithoutSubdivision but will overflow current list if the projected size of the remaining lists is bigger than the "ideal".
An IntervalListScatterer that attempts to place the same number of (uniquified) bases in each output interval list.
An enum to control the creation of the various IntervalListScatter objects
Trivially simple command line program to convert an IntervalList file to a BED file.
Performs various IntervalList manipulations.
 
Tools that process genomic intervals in various formats.
 
High level metrics about the presence of outward- and inward-facing pairs within a SAM file generated with a jumping library, produced by the CollectJumpingLibraryMetrics program and usually stored in a file with the extension ".jump_metrics".
Helper class used to transform tile data for a lane into a collection of IlluminaPhasingMetrics
A class to generate library Ids and keep duplication metrics by library IDs.
Liftover SNPs in HaplotypeMaps from one reference to another
This tool adjusts the coordinates in an interval list on one reference to its homologous interval list on another reference, based on a chain file that describes the correspondence between the two references.
 
Summary
Created by jcarey on 3/13/14.
The locs file format is one 3 Illumina formats(pos, locs, and clocs) that stores position data exclusively.
Describes the behavior of a locus relative to a gene.
Creates a VCF that contains all the site-level information for all records in the input VCF but no genotype information.
Creates a TSV from sample name to VCF/GVCF path, with one line per input.
A better duplication marking algorithm that handles all cases including clipped and gapped alignments.
Enum used to control how duplicates are flagged in the DT optional tag on each read.
Enum for the possible values that a duplicate read can be tagged with in the DT attribute.
 
MarkDuplicates calculation helper class for flow based mode The class extends the behavior of MarkDuplicates which contains the complete code for the non-flow based mode.
 
An even better duplication marking algorithm that handles all cases including clipped and gapped alignments.
This will iterate through a coordinate sorted SAM file (iterator) and either mark or remove duplicates as appropriate.
Command line program to mark the location of adapter sequences.
This is the mark queue.
Represents the results of a fingerprint comparison between one dataset and a specific fingerprint file.
General math utilities
A collection of common math operations that work with log values.
Program to generate a data table and chart of mean quality by cycle from a BAM file.
Map from String to ReadEnds object.
Describes the type and number of mendelian violations found within a Trio.
Created by farjoun on 6/25/16.
An extension of MetricBase that knows how to merge-by-adding fields that are appropriately annotated (MergeByAdding).
Metrics whose values can be merged by adding.
Metrics whose values should be equal when merging.
Metrics that are merged manually in the MergeableMetricBase.merge(MergeableMetricBase) ()}.
Metrics that are not merged, but are subsequently derived from other metrics, for example by MergeableMetricBase.calculateDerivedFields().
Metrics that are not merged.
Summary
Class to take genotype calls from a ped file output from zCall and merge them into a vcf from autocall.
This tool is used for combining SAM and/or BAM files from different runs or read groups into a single file, similar to the \"merge\" function of Samtools (http://www.htslib.org/doc/samtools.html).
Combines multiple variant files into a single variant file.
For use with Picard metrics programs that may output metrics for multiple levels of aggregation with an analysis.
MMapBackedIteratorFactory a file reader that takes a header size and a binary file, maps the file to a read-only byte buffer and provides methods to retrieve the header as it's own bytebuffer and create iterators of different data types over the values of file (starting after the end of the header).
For a paired-end aligner that aligns each end independently, select the pair of alignments that result in the largest insert size.
MultiLevelCollector<METRIC_TYPE extends htsjdk.samtools.metrics.MetricBase,Histogram_KEY extends Comparable,ARGTYPE>
MultiLevelCollector handles accumulating Metrics at different MetricAccumulationLevels(ALL_READS, SAMPLE, LIBRARY, READ_GROUP).
 
Created by jcarey on 3/13/14.
NextSeq-style bcl's have all tiles for a cycle in a single file.
Parse .bcl.bgzf files that contain multiple tiles in a single file.
MultiTileFileUtil<OUTPUT_RECORD extends picard.illumina.parser.IlluminaData>
For file types for which there is one file per lane, with fixed record size, and all the tiles in it, so the s_.bci file can be used to figure out where each tile starts and ends.
Read filter file that contains multiple tiles in a single file.
Created by jcarey on 3/13/14.
Read locs file that contains multiple tiles in a single file.
MultiTileParser<OUTPUT_RECORD extends picard.illumina.parser.IlluminaData>
Abstract class for files with fixed-length records for multiple tiles, e.g.
A tool to count the number of non-N bases in a fasta file
Little program to "normalize" a fasta file to ensure that all line of sequence are the same length, and are a reasonable length!
Contains methods for finding optical/co-localized/sequencing duplicates.
Picard default argument collection for an optional reference.
Miscellaneous tools, e.g.
Base interface for an output argument collection.
In multiple locations we need to know what cycles are output, as of now we output all non-skip cycles, but rather than sprinkle this knowledge throughout the parser code, instead OutputMapping provides all the data a client might want about the cycles to be output including what ReadType they are.
An error metric for the errors invovling bases in the overlapping region of a read-pair.
A calculator that estimates the error rate of the bases it observes, assuming that the reference is truth.
Pair<X extends Comparable<X>,Y extends Comparable<Y>>
Simple Pair class.
An iterator that takes a pair of iterators over VariantContexts and iterates over them in tandem.
Little class to hold a pair of VariantContexts that are in sync with one another.
A base class for Metrics for targeted panels.
 
A class whose purpose is to initialize the various plugins that provide Path support.
Represents a .ped file of family information as documented here: http://pngu.mgh.harvard.edu/~purcell/plink/data.shtml Stores the information in memory as a map of individualId -> Pedigree information for that individual
 
 
PerTileParser<ILLUMINA_DATA extends picard.illumina.parser.IlluminaData>
Abstract base class for Parsers that open a single tile file at a time and iterate through them.
 
PerUnitMetricCollector<BEAN extends htsjdk.samtools.metrics.MetricBase,HKEY extends Comparable,ARGTYPE>
PerRecordCollector - An interface for classes that collect data in order to generate one or more metrics.
Argument Collection which holds parameters common to classes that want to add PG tags to reads in SAM/BAM files
Small interface that provides access to the physical location information about a cluster.
Stores the minimal information needed for optical duplicate detection.
This stores records that are comparable for detecting optical duplicates.
Small class that provides access to the physical location information about a cluster.
Small class that provides access to the physical location information about a cluster.
This is the main class of Picard and is the way of executing individual command line programs.
Basic Picard runtime exception that, for now, does nothing much
A Subclass of HtsPath with conversion to Path making use of IOUtil
Created by jcarey on 3/13/14.
The pos file format is one 3 Illumina formats(pos, locs, and clocs) that stores position data exclusively.
Summary
PosParser parses multiple files formatted as one of the three file formats that contain position information only (pos, locs, and clocs).
Performs on-the-fly filtering of the provided VariantContext Iterator such that only variants that satisfy all predicates are emitted.
It is useful to define a key such that the key will occur at most once among the primary alignments in a given file (assuming the file is valid).
Given a set of alignments for a read or read pair, mark one alignment as primary, according to whatever strategy is appropriate.
Utility for loading properties files from resources.
Filters out sites that have a QD annotation applied to them and where the QD value is lower than a lower limit.
Charts quality score distribution within a BAM file.
A collection of helper utilities for iterating through reads that are in query-name sorted read order as pairs
 
While structurally identical to CompositeIndex, this class is maintained as it makes code more readable when the two are used together (see QSeqParser)
Classes, methods, and enums that deal with the stratification of read bases and reference information.
Stratifies into quintiles of read cycle.
Stratifies according to the number of matching cigar operators (from CIGAR string) that the read has.
A CollectionStratifier is a stratifier that uses a collection of stratifiers to inform the stratification.
Types of consensus reads as determined by the number of duplicates used from first and second strands.
Stratify by tags used during duplex and single index consensus calling.
An enum designed to hold a binned version of any probability-like number (between 0 and 1) in quintiles
Stratifies base into their read's tile which is parsed from the read-name.
Stratifies base into their read's X coordinate which is parsed from the read-name.
Stratifies base into their read's Y coordinate which is parsed from the read-name.
A stratifier that uses GC (of the read) to stratify.
Stratifies according to the length of an insertion or deletion.
Stratifies according to the number of indel bases (from CIGAR string) that the read has.
 
Stratify bases according to the type of Homopolymer that they belong to (repeating element, final reference base and whether the length is "long" or not).
Stratifies according to the overall mismatches (from SAMTag.NM) that the read has against the reference, NOT including the current base.
Stratify by the number of Ns found in the read.
An enum for holding a reads read-pair's Orientation (i.e.
A PairStratifier is a stratifier that uses two other stratifiers to inform the stratification.
An enum to hold information about the "properness" of a read pair
An enum for holding the direction for a read (positive strand or negative strand
An enum to hold the ordinality of a read
The main interface for a stratifier.
Data for a single end of a paired-end read, a barcode read, or for the entire read if not paired end.
Tools that manipulate read data in SAM, BAM or CRAM format
Represents one set of cycles in an ReadStructure (e.g.
Little struct-like class to hold read pair (and fragment) end data for duplicate marking.
Little struct-like class to hold read pair (and fragment) end data for MarkDuplicatesWithMateCigar
Codec for ReadEnds that just outputs the primitive fields and reads them back.
Interface for storing and retrieving ReadEnds objects.
 
Created by nhomer on 9/13/15.
A class to store individual records for MarkDuplicatesWithMateCigar.
 
Provides access to the physical location information about a cluster.
Describes the intended logical output structure of clusters of an Illumina run.
A read type describes a stretch of cycles in an ReadStructure (e.g.
Base interface for a reference argument collection.
Tools that analyze and manipulate FASTA format references
Loads gene annotations from a refFlat file into an OverlapDetector.
 
Class which contains utility functions that use reflection.
Renames a sample within a VCF or BCF.
Reorders a SAM/BAM input file according to the order of contigs in a second reference file.
 
Little struct-like class to hold a record index, the index of the corresponding representative read, and duplicate set size information.
Codec for read names and integers that outputs the primitive fields and reads them back.
 
Argument collection for references that are required (and not common).
This tool reverts the original base qualities (if specified) and adds the mate cigar tag to mapped SAM, BAM or CRAM files.
Used as a return for the canSkipSAMFile function.
Reverts a SAM file by optionally restoring original quality scores and by removing all alignment information.
 
Util class for executing R scripts.
Metrics about the alignment of RNA-seq reads within a SAM file to genes, produced by the CollectRnaSeqMetrics program and usually stored in a file with the extension ".rna_metrics".
 
 
Holds information about CpG sites encountered for RRBS processing QC
 
Holds summary statistics from RRBS processing QC
Class that takes in a set of alignment information in SAM format and merges it with the set of all reads for which alignment was attempted, stored in an unmapped SAM file.
Compare two SAM/BAM files.
 
Argument collection for SAM comparison
Metric for results of SamComparison.
Converts a BAM file to human-readable SAM output or vice versa
 
SAMRecordAndReferenceMultiLevelCollector<BEAN extends htsjdk.samtools.metrics.MetricBase,HKEY extends Comparable>
 
SAMRecordMultiLevelCollector<BEAN extends htsjdk.samtools.metrics.MetricBase,HKEY extends Comparable>
Defines a MultilevelPerRecordCollector using the argument type of SAMRecord so that this doesn't have to be redefined for each subclass of MultilevelPerRecordCollector
This class sets the duplicate read flag as the result state when examining sets of records.
Class to take unmapped reads in SAM/BAM/CRAM file format and create Maq binary fastq format file(s) -- one or two of them, depending on whether it's a paired-end read.
Extracts read sequences and qualities from the input SAM/BAM file and writes them into the output file in Sanger FASTQ format.
Extracts read sequences and qualities from the input SAM/BAM file and SAM/BAM tags and writes them into output files in Sanger FASTQ format.
A Tool for breaking up a reference into intervals of alternating regions of N and ACGT bases.
 
Class with helper methods for generating and writing SequenceDictionary objects.
 
 
Bait bias artifacts broken down by context.
Summary analysis of a single bait bias artifact, also known as a reference bias artifact.
Pre-adapter artifacts broken down by context.
Summary analysis of a single pre-adapter artifact.
Deprecated.
Fixes the NM, MD, and UQ tags in a SAM or BAM file.
Represents the sex of an individual.
A calculator that estimates the error rate of the bases it observes, assuming that the reference is truth.
This is a simple tool to mark duplicates using the DuplicateSetIterator, DuplicateSet, and SAMRecordDuplicateComparator.
A class for finding the distance between a single barcode and a barcode-read (with base qualities)
Super class that is designed to provide some consistent structure between subclasses that simply iterate once over a coordinate sorted BAM and collect information from the records as the go in order to produce some kind of output.
Class to represent a SNP in context of a haplotype block that is used in fingerprinting.
SortedBasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data from standard Illumina formats to specific output records (FASTA records/SAM records).
Summary
Sorts a SAM or BAM file.
Sorts one or more VCF files according to the order of the contigs in the header/sequence dictionary and then by coordinate.
Command-line program to split a SAM/BAM/CRAM file into separate files based on library name.
Splits the input queryname sorted or query-grouped SAM/BAM/CRAM file and writes it into multiple BAM files, each with an approximately equal number of reads.
Splits the input VCF file into two, one for indels and one for SNPs.
A set of String constants in which the name of the constant (minus the _SHORT_NAME suffix) is the standard long Option name, and the value of the constant is the standard shortName.
Parser for tab-delimited files
Parse a tabbed text file in which columns are found by looking at a header line rather than by position.
Metrics class for the analysis of reads obtained from targeted pcr experiments e.g.
Calculates HS metrics for a given SAM or BAM file.
TargetMetrics, are metrics to measure how well we hit specific targets (or baits) when using a targeted sequencing process like hybrid selection or Targeted PCR Techniques (TSCA).
TargetMetrics, are metrics to measure how well we hit specific targets (or baits) when using a targeted sequencing process like hybrid selection or Targeted PCR Techniques (TSCA).
TargetMetrics, are metrics to measure how well we hit specific targets (or baits) when using a targeted sequencing process like hybrid selection or Targeted PCR Techniques (TSCA).
A simple class that is used to store the coverage information about an interval.
 
For internal test purposes only.
Created by David Benjamin on 5/13/15.
 
TheoreticalSensitivityMetrics, are metrics calculated from TheoreticalSensitivity and parameters used in the calculation.
 
This version of the thread pool executor will throw an exception if any of the internal jobs have throw exceptions while executing
Represents a tile from TileMetricsOut.bin.
Load a file containing 8-byte records like this: tile number: 4-byte int number of clusters in tile: 4-byte int Number of records to read is determined by reaching EOF.
 
Reads a TileMetricsOut file commonly found in the InterOp directory of an Illumina Run Folder.
Helper class which captures the combination of a lane, tile & metric code
IlluminaPhasingMetrics corresponds to a single record in a TileMetricsOut file
 
Utility for reading the tile data from an Illumina run directory's TileMetricsOut.bin file
Captures information about a phasing value - Which read it corresponds to, which phasing type and a median value
Defines the first or second template read for a tile
Enum representation of a transition from one base to any other.
 
This tool locates and tags duplicate reads in a BAM or SAM file, where duplicate reads are defined as originating from a single fragment of DNA.
UmiGraph is used to identify UMIs that come from the same original source molecule.
Metrics that are calculated during the process of marking duplicates within a stream of SAMRecords using the UmiAwareDuplicateSetIterator.
A utility class for dealing with unsigned types.
UnortedBasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data from standard Illumina formats to specific output records (FASTA records/SAM records).
Takes a VCF file and a Sequence Dictionary (from a variety of file types) and updates the Sequence Dictionary in VCF.
This tool reports on the validity of a SAM or BAM file relative to the SAM format specification.
 
Describes the functionality for an executor that manages the delegation of work to VariantProcessor.Accumulators.
A VariantAccumulatorExecutor that breaks down work into chunks described by the provided VariantIteratorProducer and spreads them over the indicated number of threads.
Tools that evaluate and refine variant calls, e.g.
Interface for classes that can generate filters for VariantContexts.
Tools that filter variants
A mechanism for iterating over CloseableIterator of VariantContexts in in some fashion, given VCF files and optionally an interval list.
Tools that manipulate variant call format (VCF) data
Describes an object that processes variants and produces a result.
Handles VariantContexts, and accumulates their data in some fashion internally.
Generates instances of VariantProcessor.Accumulators.
Simple builder of VariantProcessors.
Takes a collection of results produced by VariantProcessor.Accumulator.result() and merges them into a single RESULT.
Enum to hold the possible types of dbSnps.
Deprecated.
from 2022-03-17, Use VcfPathSegment
Deprecated.
from 2022-03-17, Use VcfPathSegmentGenerator
Converts an ASCII VCF file to a binary BCF or vice versa.
Describes a segment of a particular VCF file.
Describes a mechanism for producing VcfPathSegments from a VCF file path.
A simple program to convert a Genotyping Arrays VCF to an ADPC file (Illumina intensity data file).
Converts a VCF or BCF file to a Picard Interval List.
 
Created by farjoun on 4/1/17.
 
Prints a SAM or BAM file to the screen.
 
 
Metrics for evaluating the performance of whole genome sequencing experiments.
Interface for processing data and generate result for CollectWgsMetrics
WgsMetricsProcessorImpl<T extends htsjdk.samtools.util.AbstractRecordAndOffset>
Implementation of WgsMetricsProcessor that gets input data from a given iterator and processes it with a help of collector