Package picard.illumina
Class BasecallsConverter<CLUSTER_OUTPUT_RECORD>
java.lang.Object
picard.illumina.BasecallsConverter<CLUSTER_OUTPUT_RECORD>
- Type Parameters:
CLUSTER_OUTPUT_RECORD
- The type of record that this converter will convert to.
- Direct Known Subclasses:
SortedBasecallsConverter
,UnsortedBasecallsConverter
BasecallsConverter utilizes an underlying IlluminaDataProvider to convert parsed and decoded sequencing data
from standard Illumina formats to specific output records (FASTA records/SAM records).
The underlying IlluminaDataProvider apply several optional transformations that can include EAMSS filtering, non-PF read filtering and quality score recoding using a BclQualityEvaluationStrategy.
The converter can also limit the scope of data that is converted from the data provider by setting the tile to start on (firstTile) and the total number of tiles to process (tileLimit).
Additionally, BasecallsConverter can optionally demultiplex reads by outputting barcode specific reads to their associated writers..
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionprotected static interface
Interface that defines a converter that takes ClusterData and returns OUTPUT_RECORD type objects.protected static interface
Interface that defines a writer that will write out OUTPUT_RECORD type objects. -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected BarcodeExtractor
protected final Map<String,
? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> static final Set<IlluminaDataType>
static final Set<IlluminaDataType>
protected final boolean
protected final boolean
protected final boolean
protected final IlluminaDataProviderFactory[]
static final Comparator<Integer>
A comparator used to sort Illumina tiles in their proper order.protected final htsjdk.io.AsyncWriterPool
-
Constructor Summary
ConstructorsConstructorDescriptionBasecallsConverter
(File basecallsDir, File barcodesDir, int[] lanes, ReadStructure readStructure, Map<String, ? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap, boolean demultiplex, Integer firstTile, Integer tileLimit, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, boolean ignoreUnexpectedBarcodes, boolean applyEamssFiltering, boolean includeNonPfReads, htsjdk.io.AsyncWriterPool writerPool, BarcodeExtractor barcodeExtractor) Constructs a new BasecallsConverter object. -
Method Summary
Modifier and TypeMethodDescriptionvoid
Closes all writers.protected static Set<IlluminaDataType>
getDataTypesFromReadStructure
(ReadStructure readStructure, boolean demultiplex, File barcodesDir) Given a read structure return the data types that need to be parsed for this runprotected IlluminaDataProviderFactory[]
Gets the data provider factory used to create the underlying data provider.static File[]
getTiledFiles
(File baseDirectory, Pattern pattern) Applies an lane and tile based regex to return all files matching that regex for each tile.protected void
protected String
maybeDemultiplex
(ClusterData cluster, Map<String, BarcodeMetric> metrics, BarcodeMetric noMatch, ReadStructure outputReadStructure) If we are demultiplexing and a barcodeExtractor is defined then this method will perform on-the-fly demuxing.abstract void
processTilesAndWritePerSampleOutputs
(Set<String> barcodes) Abstract method for processing tiles of data and outputting records for each barcode.protected void
Must be called before doTileProcessing.protected void
setTileLimits
(Integer firstTile, Integer tileLimit) Uses the firstTile and tileLimit parameters to set which tiles will be processed.protected void
updateMetrics
(Map<String, BarcodeMetric> metrics, BarcodeMetric noMatch)
-
Field Details
-
DATA_TYPES_WITH_BARCODE
-
DATA_TYPES_WITHOUT_BARCODE
-
laneFactories
-
demultiplex
protected final boolean demultiplex -
ignoreUnexpectedBarcodes
protected final boolean ignoreUnexpectedBarcodes -
barcodeRecordWriterMap
protected final Map<String,? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap -
includeNonPfReads
protected final boolean includeNonPfReads -
writerPool
protected final htsjdk.io.AsyncWriterPool writerPool -
converter
-
tiles
-
barcodeExtractor
-
TILE_NUMBER_COMPARATOR
A comparator used to sort Illumina tiles in their proper order. Because the tile number is followed by a colon, a tile number that is a prefix of another tile number should sort after. (e.g. 10 sorts after 100). Tile numbers with the same number of digits are sorted numerically.
-
-
Constructor Details
-
BasecallsConverter
public BasecallsConverter(File basecallsDir, File barcodesDir, int[] lanes, ReadStructure readStructure, Map<String, ? extends htsjdk.io.Writer<CLUSTER_OUTPUT_RECORD>> barcodeRecordWriterMap, boolean demultiplex, Integer firstTile, Integer tileLimit, BclQualityEvaluationStrategy bclQualityEvaluationStrategy, boolean ignoreUnexpectedBarcodes, boolean applyEamssFiltering, boolean includeNonPfReads, htsjdk.io.AsyncWriterPool writerPool, BarcodeExtractor barcodeExtractor) Constructs a new BasecallsConverter object.- Parameters:
basecallsDir
- Where to read basecalls from.barcodesDir
- Where to read barcodes from (optional; use basecallsDir if not specified).lanes
- What lanes to process.readStructure
- How to interpret each cluster.barcodeRecordWriterMap
- Map from barcode to CLUSTER_OUTPUT_RECORD writer. If demultiplex is false, must contain one writer stored with key=null.demultiplex
- If true, output is split by barcode, otherwise all are written to the same output stream.firstTile
- (For debugging) If non-null, start processing at this tile.tileLimit
- (For debugging) If non-null, process no more than this many tiles.bclQualityEvaluationStrategy
- The basecall quality evaluation strategy that is applyed to decoded base calls.ignoreUnexpectedBarcodes
- If true, will ignore reads whose called barcode is not found in barcodeRecordWriterMap.applyEamssFiltering
- If true, apply EAMSS filtering if parsing BCLs for bases and quality scores.includeNonPfReads
- If true, will include ALL reads (including those which do not have PF set). This option does nothing for instruments that output cbcls (Novaseqs)barcodeExtractor
- The `BarcodeExtractor` used to do inline barcode matching.
-
-
Method Details
-
processTilesAndWritePerSampleOutputs
Abstract method for processing tiles of data and outputting records for each barcode.- Parameters:
barcodes
- The barcodes used optionally for demultiplexing. Must contain at least a single null value if no demultiplexing is being done.- Throws:
IOException
-
closeWriters
Closes all writers. If an AsycnWriterPool is used call close on that, otherwise iterate each writer and close it.- Throws:
IOException
- throw if there is an error closing the writer.
-
getTiledFiles
Applies an lane and tile based regex to return all files matching that regex for each tile.- Parameters:
baseDirectory
- The directory to search for tiled files.pattern
- The pattern used to match files.- Returns:
- A file array of all of the tile based files that match the regex pattern.
-
getDataTypesFromReadStructure
protected static Set<IlluminaDataType> getDataTypesFromReadStructure(ReadStructure readStructure, boolean demultiplex, File barcodesDir) Given a read structure return the data types that need to be parsed for this run- Parameters:
readStructure
- The read structure that defines how the read is set up.demultiplex
- If true, output is split by barcode, otherwise all are written to the same output stream.barcodesDir
- The barcodes dir that contains barcode files.- Returns:
- A data type array for each piece of data needed to satisfy the read structure.
-
getLaneFactories
Gets the data provider factory used to create the underlying data provider.- Returns:
- A factory used for create the underlying data provider.
-
setConverter
protected void setConverter(BasecallsConverter.ClusterDataConverter<CLUSTER_OUTPUT_RECORD> converter) Must be called before doTileProcessing. This is not passed in the ctor because often the IlluminaDataProviderFactory is needed in order to construct the converter.- Parameters:
converter
- Converts ClusterData to CLUSTER_OUTPUT_RECORD
-
setTileLimits
Uses the firstTile and tileLimit parameters to set which tiles will be processed. The processor will start with firstTile and continue to process tiles in order until it has processed at most tileLimit tiles.- Parameters:
firstTile
- The tile to begin processing at.tileLimit
- The maximum number of tiles to process.
-
maybeDemultiplex
protected String maybeDemultiplex(ClusterData cluster, Map<String, BarcodeMetric> metrics, BarcodeMetric noMatch, ReadStructure outputReadStructure) If we are demultiplexing and a barcodeExtractor is defined then this method will perform on-the-fly demuxing. Otherwise it will just return the pre-demuxed barcode from `ExtractIlluminaBarcodes`.- Parameters:
cluster
- The cluster data to demuxmetrics
- The metrics object that will store the demux metrics.noMatch
- A no-match metric object to store metrice for any read that doesn't demuxoutputReadStructure
- The output `ReadStructure` for this cluster- Returns:
- The matched barcode or null if no barcode was matched.
-
interruptAndShutdownExecutors
-
updateMetrics
-