Package picard.sam.markduplicates.util
Class OpticalDuplicateFinder
java.lang.Object
picard.sam.util.ReadNameParser
picard.sam.markduplicates.util.OpticalDuplicateFinder
- All Implemented Interfaces:
Serializable
Contains methods for finding optical/co-localized/sequencing duplicates.
- See Also:
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final int
static final int
static final int
int
Fields inherited from class picard.sam.util.ReadNameParser
DEFAULT_READ_NAME_REGEX, readNameRegex
-
Constructor Summary
ConstructorsConstructorDescriptionUses the default duplicate distanceDEFAULT_OPTICAL_DUPLICATE_DISTANCE
(100) and the default read name regexReadNameParser.DEFAULT_READ_NAME_REGEX
.OpticalDuplicateFinder
(String readNameRegex, int opticalDuplicatePixelDistance, long maxDuplicateSetSize, htsjdk.samtools.util.Log log) OpticalDuplicateFinder
(String readNameRegex, int opticalDuplicatePixelDistance, htsjdk.samtools.util.Log log) -
Method Summary
Modifier and TypeMethodDescriptionboolean[]
findOpticalDuplicates
(List<? extends PhysicalLocation> list, PhysicalLocation keeper) Finds which reads within the list of duplicates that are likely to be optical/co-localized duplicates of one another.void
setBigDuplicateSetSize
(int bigDuplicateSetSize) Sets the size of a set that is big enough to log progress about.void
setMaxDuplicateSetSize
(long maxDuplicateSetSize) Sets the size of a set that is too big to process.Methods inherited from class picard.sam.util.ReadNameParser
addLocationInformation, getLastThreeFields, rapidParseInt
-
Field Details
-
opticalDuplicatePixelDistance
public int opticalDuplicatePixelDistance -
DEFAULT_OPTICAL_DUPLICATE_DISTANCE
public static final int DEFAULT_OPTICAL_DUPLICATE_DISTANCE- See Also:
-
DEFAULT_BIG_DUPLICATE_SET_SIZE
public static final int DEFAULT_BIG_DUPLICATE_SET_SIZE- See Also:
-
DEFAULT_MAX_DUPLICATE_SET_SIZE
public static final int DEFAULT_MAX_DUPLICATE_SET_SIZE- See Also:
-
-
Constructor Details
-
OpticalDuplicateFinder
public OpticalDuplicateFinder()Uses the default duplicate distanceDEFAULT_OPTICAL_DUPLICATE_DISTANCE
(100) and the default read name regexReadNameParser.DEFAULT_READ_NAME_REGEX
. -
OpticalDuplicateFinder
public OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, htsjdk.samtools.util.Log log) - Parameters:
readNameRegex
- seeReadNameParser.DEFAULT_READ_NAME_REGEX
.opticalDuplicatePixelDistance
- the optical duplicate pixel distancelog
- the log to which to write messages.
-
OpticalDuplicateFinder
public OpticalDuplicateFinder(String readNameRegex, int opticalDuplicatePixelDistance, long maxDuplicateSetSize, htsjdk.samtools.util.Log log) - Parameters:
readNameRegex
- seeReadNameParser.DEFAULT_READ_NAME_REGEX
.opticalDuplicatePixelDistance
- the optical duplicate pixel distancemaxDuplicateSetSize
- the size of a set that is too big enough to processlog
- the log to which to write messages.
-
-
Method Details
-
setBigDuplicateSetSize
public void setBigDuplicateSetSize(int bigDuplicateSetSize) Sets the size of a set that is big enough to log progress about. Defaults to 1000- Parameters:
bigDuplicateSetSize
- the size of a set that is big enough to log progress about
-
setMaxDuplicateSetSize
public void setMaxDuplicateSetSize(long maxDuplicateSetSize) Sets the size of a set that is too big to process. Defaults to 300000- Parameters:
maxDuplicateSetSize
- the size of a set that is too big enough to process
-
findOpticalDuplicates
public boolean[] findOpticalDuplicates(List<? extends PhysicalLocation> list, PhysicalLocation keeper) Finds which reads within the list of duplicates that are likely to be optical/co-localized duplicates of one another. Within each cluster of optical duplicates that is found, one read remains un-flagged for optical duplication and the rest are flagged as optical duplicates. The set of reads that are considered optical duplicates are indicated by returning "true" at the same index in the resulting boolean[] as the read appeared in the input list of physical locations.- Parameters:
list
- a list of reads that are determined to be duplicates of one anotherkeeper
- a single PhysicalLocation that is the one being kept as non-duplicate, and thus should never be annotated as an optical duplicate. May in some cases be null, or a PhysicalLocation not contained within the list!- Returns:
- a boolean[] of the same length as the incoming list marking which reads are optical duplicates
-