Class VcfEntry

All Implemented Interfaces:
Serializable, Cloneable, Comparable<Interval>, Iterable<VcfGenotype>, TxtSerializable

public class VcfEntry extends Marker implements Iterable<VcfGenotype>
A VCF entry is a line in a VCF file A VCF line can have multiple variants, and multiple genotypes The VcfEntry represents the VCF line, NOT the variant itself. for example, if the `START` field in the VCF line may differ from the `start` possition of the variant because the first base of the `REF` field is used as an "anchor".
Author:
pablocingolani
See Also:
  • Field Details

    • FILTER_PASS

      public static final String FILTER_PASS
      See Also:
    • WITHIN_FIELD_SEP

      public static final char WITHIN_FIELD_SEP
      See Also:
    • SUB_FIELD_SEP

      public static final String SUB_FIELD_SEP
      See Also:
    • EMPTY_STRING_ARRAY

      public static final String[] EMPTY_STRING_ARRAY
    • ALLELE_FEQUENCY_COMMON

      public static final double ALLELE_FEQUENCY_COMMON
      See Also:
    • ALLELE_FEQUENCY_LOW

      public static final double ALLELE_FEQUENCY_LOW
      See Also:
    • MAX_PADN

      public static final int MAX_PADN
      See Also:
    • INFO_KEY_PATTERN

      public static final Pattern INFO_KEY_PATTERN
    • VCF_ALT_NON_REF

      public static final String VCF_ALT_NON_REF
      See Also:
    • VCF_ALT_NON_REF_OLD

      public static final String VCF_ALT_NON_REF_OLD
      See Also:
    • VCF_ALT_MISSING_REF

      public static final String VCF_ALT_MISSING_REF
      See Also:
    • VCF_ALT_INV

      public static final String VCF_ALT_INV
      See Also:
    • VCF_ALT_A_ARRAY

      public static final String[] VCF_ALT_A_ARRAY
    • VCF_ALT_C_ARRAY

      public static final String[] VCF_ALT_C_ARRAY
    • VCF_ALT_G_ARRAY

      public static final String[] VCF_ALT_G_ARRAY
    • VCF_ALT_T_ARRAY

      public static final String[] VCF_ALT_T_ARRAY
    • VCF_ALT_N_ARRAY

      public static final String[] VCF_ALT_N_ARRAY
    • VCF_ALT_B_ARRAY

      public static final String[] VCF_ALT_B_ARRAY
    • VCF_ALT_D_ARRAY

      public static final String[] VCF_ALT_D_ARRAY
    • VCF_ALT_H_ARRAY

      public static final String[] VCF_ALT_H_ARRAY
    • VCF_ALT_V_ARRAY

      public static final String[] VCF_ALT_V_ARRAY
    • VCF_ALT_M_ARRAY

      public static final String[] VCF_ALT_M_ARRAY
    • VCF_ALT_R_ARRAY

      public static final String[] VCF_ALT_R_ARRAY
    • VCF_ALT_W_ARRAY

      public static final String[] VCF_ALT_W_ARRAY
    • VCF_ALT_S_ARRAY

      public static final String[] VCF_ALT_S_ARRAY
    • VCF_ALT_Y_ARRAY

      public static final String[] VCF_ALT_Y_ARRAY
    • VCF_ALT_K_ARRAY

      public static final String[] VCF_ALT_K_ARRAY
    • VCF_ALT_ASTERISK_ARRAY

      public static final String[] VCF_ALT_ASTERISK_ARRAY
    • VCF_ALT_MISSING_ARRAY

      public static final String[] VCF_ALT_MISSING_ARRAY
    • VCF_ALT_NON_REF_OLD_ARRAY

      public static final String[] VCF_ALT_NON_REF_OLD_ARRAY
    • VCF_ALT_NON_REF_ARRAY

      public static final String[] VCF_ALT_NON_REF_ARRAY
    • VCF_ALT_INV_ARRAY

      public static final String[] VCF_ALT_INV_ARRAY
    • VCF_INFO_END

      public static final String VCF_INFO_END
      See Also:
    • VCF_INFO_SVLEN

      public static final String VCF_INFO_SVLEN
      See Also:
    • VCF_INFO_IMPRECISE

      public static final String VCF_INFO_IMPRECISE
      See Also:
    • VCF_INFO_HOMS

      public static final String VCF_INFO_HOMS
      See Also:
    • VCF_INFO_HETS

      public static final String VCF_INFO_HETS
      See Also:
    • VCF_INFO_NAS

      public static final String VCF_INFO_NAS
      See Also:
    • VCF_INFO_PRIVATE

      public static final String VCF_INFO_PRIVATE
      See Also:
    • alts

      protected String[] alts
    • altStr

      protected String altStr
    • chromosomeName

      protected String chromosomeName
    • filter

      protected String filter
    • format

      protected String format
    • formatFields

      protected String[] formatFields
    • genotypeFields

      protected String[] genotypeFields
    • genotypeFieldsStr

      protected String genotypeFieldsStr
    • genotypeScores

      protected byte[] genotypeScores
    • info

      protected HashMap<String,String> info
    • infoStr

      protected String infoStr
    • line

      protected String line
    • lineNum

      protected int lineNum
    • quality

      protected Double quality
    • ref

      protected String ref
    • variants

      protected LinkedList<Variant> variants
    • vcfEffects

      protected List<VcfEffect> vcfEffects
    • vcfFileIterator

      protected VcfFileIterator vcfFileIterator
    • vcfGenotypes

      protected ArrayList<VcfGenotype> vcfGenotypes
  • Constructor Details

  • Method Details

    • cleanUnderscores

      public static String cleanUnderscores(String s)
      Return a string without leading, trailing and duplicated underscores
    • isEmpty

      public static boolean isEmpty(String value)
      Does 'value' represent an EMPTY / MISSING value in a VCF field? (or multiple MISSING comma-separated values)
    • isValidInfoKey

      public static boolean isValidInfoKey(String key)
      Make sure the INFO key matches the regular expression (as specified in VCF spec 4.3)
    • isValidInfoValue

      public static boolean isValidInfoValue(String value)
      Check that this value can be added to an INFO field
      Returns:
      true if OK, false if invalid value
    • vcfInfoDecode

      public static String vcfInfoDecode(String str)
      Decode INFO value
    • vcfInfoEncode

      public static String vcfInfoEncode(String str)
      Encode a string to be used in an 'INFO' field value From the VCF 4.3 specification Characters with special meaning (such as field delimiters ';' in INFO or ':' FORMAT fields) must be represented using the capitalized percent encoding: %3A : (colon) %3B ; (semicolon) %3D = (equal sign) %25 % (percent sign) %2C , (comma) %0D CR %0A LF %09 TAB
    • vcfInfoKeySafe

      public static String vcfInfoKeySafe(String str)
      Return a string safe to be used in an 'INFO' field key
    • vcfInfoValueSafe

      public static String vcfInfoValueSafe(String str)
      Return a string safe to be used in an 'INFO' field value
    • addFilter

      public void addFilter(String filterStr)
      Add string to FILTER field
    • addFormat

      public void addFormat(String formatName)
      Add a 'FORMAT' field
    • addGenotype

      public void addGenotype(String vcfGenotypeStr)
      Add a genotype as a string
    • addInfo

      public void addInfo(String key, String value)
      Add a "key=value" tuple the info field
      Parameters:
      key - : INFO key name
      value - : Can be null if it is a boolean field.
    • alleleFrequencyType

      public VcfEntry.AlleleFrequencyType alleleFrequencyType()
      Categorization by allele frequency
    • calcHetero

      public Boolean calcHetero()
      Is this entry heterozygous? Infer Hom/Her if there is only one sample in the file. Ohtherwise the field is null.
    • check

      public String check()
      Perform several simple checks and report problems (if any).
    • cloneShallow

      public Cds cloneShallow()
      Description copied from class: Marker
      Perform a shallow clone
      Overrides:
      cloneShallow in class Marker
    • compressGenotypes

      public boolean compressGenotypes()
      Compress genotypes into "HO/HE/NA" INFO fields
    • delFilter

      public boolean delFilter(String filterStr)
      Remove a string from FILTER field
    • getAltIndex

      public int getAltIndex(String alt)
      Get index of matching ALT entry
      Returns:
      -1 if not found
    • getAlts

      public String[] getAlts()
    • getAltsStr

      public String getAltsStr()
      Create a comma separated ALTS string
    • getChromosomeNameOri

      public String getChromosomeNameOri()
      Original chromosome name (as it appeared in the VCF file)
      Overrides:
      getChromosomeNameOri in class Interval
    • getFilter

      public String getFilter()
    • getFormat

      public String getFormat()
    • getFormatFields

      public String[] getFormatFields()
    • getGenotypesScores

      public byte[] getGenotypesScores()
      Return genotypes parsed as an array of codes
    • getInfo

      public String getInfo(String key)
      Get info string
    • getInfo

      public String getInfo(String key, String allele)
      Get info string for a specific allele
    • getInfo

      public String getInfo(String key, Variant var)
      Get an INFO field matching a variant
    • getInfoFlag

      public boolean getInfoFlag(String key)
      Does the entry exists?
    • getInfoFloat

      public double getInfoFloat(String key)
      Get info field as a 'double' number The norm specifies data type as 'FLOAT', that is why the name of this method might be not intuitive
    • getInfoInt

      public long getInfoInt(String key)
      Get info field as an long number The norm specifies data type as 'INT', that is why the name of this method might be not intuitive
    • getInfoKeys

      public Set<String> getInfoKeys()
      Get all keys available in the info field
    • getInfoStr

      public String getInfoStr()
      Get the full (unparsed) INFO field
    • getLine

      public String getLine()
      Original VCF line (from file)
    • getLineNum

      public int getLineNum()
    • getNumberOfSamples

      public int getNumberOfSamples()
      number of samples in this VCF file
    • getQuality

      public double getQuality()
    • getRef

      public String getRef()
    • getStr

      public String getStr()
    • getVcfEffects

      public List<VcfEffect> getVcfEffects()
    • getVcfEffects

      public List<VcfEffect> getVcfEffects(EffFormatVersion formatVersion)
      Parse 'EFF' info field and get a list of effects
    • getVcfFileIterator

      public VcfFileIterator getVcfFileIterator()
    • getVcfGenotype

      public VcfGenotype getVcfGenotype(int index)
    • getVcfGenotypes

      public List<VcfGenotype> getVcfGenotypes()
    • getVcfInfo

      public VcfHeaderInfo getVcfInfo(String id)
      Get VcfInfo type for a given ID
    • getVcfInfoNumber

      public VcfInfoType getVcfInfoNumber(String id)
      Get Info number for a given ID
    • hasAltNonRef

      public boolean hasAltNonRef()
    • hasField

      public boolean hasField(String filedName)
    • hasGenotypes

      public boolean hasGenotypes()
    • hasInfo

      public boolean hasInfo(String infoFieldName)
    • hasQuality

      public boolean hasQuality()
    • isAltNonRef

      public boolean isAltNonRef(String alt)
      Is this ALT string a NON_REF? (gVCF)
    • isBiAllelic

      public boolean isBiAllelic()
      Is this bi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
    • isCompressedGenotypes

      public boolean isCompressedGenotypes()
      Do we have compressed genotypes in "HO,HE,NA" INFO fields?
    • isFilterPass

      public boolean isFilterPass()
    • isMultiallelic

      public boolean isMultiallelic()
      Is this multi-allelic (based ONLY on the number of ALTs) WARINIG: You should use 'calcHetero()' method for a more precise calculation.
    • isShowWarningIfParentDoesNotInclude

      protected boolean isShowWarningIfParentDoesNotInclude()
      Description copied from class: Marker
      Show an error if parent does not include child?
      Overrides:
      isShowWarningIfParentDoesNotInclude in class Marker
    • isSingleSnp

      public boolean isSingleSnp()
      Is thins a VCF entry with a single SNP?
    • isSingleton

      public boolean isSingleton()
      Is this variant a singleton (appears only in one genotype)
    • isVariant

      public boolean isVariant()
      Is this a change or are the ALTs actually the same as the reference
    • isVariant

      public boolean isVariant(String alt)
      Is this ALT string a variant?
    • iterator

      public Iterator<VcfGenotype> iterator()
      Specified by:
      iterator in interface Iterable<VcfGenotype>
    • mac

      public int mac()
      Calculate Minor allele count
    • maf

      public double maf()
      Calculate Minor allele frequency
    • parse

      public void parse()
      Parse a 'line' from a 'vcfFileIterator'
    • parseLof

      public List<VcfLof> parseLof()
      Parse LOF from VcfEntry
    • parseNmd

      public List<VcfNmd> parseNmd()
      Parse NMD from VcfEntry
    • removeInfo

      public void removeInfo(String key)
      Remove INFO field
    • rmInfo

      public boolean rmInfo(String info)
      Parse INFO fields
    • setFilter

      public void setFilter(String filter)
    • setFormat

      public void setFormat(String format)
    • setGenotypeStr

      public void setGenotypeStr(String genotypeFieldsStr)
    • setLineNum

      public void setLineNum(int lineNum)
    • toStr

      public String toStr()
      To string as a simple "CHR:START_REF/ALTs" format
      Overrides:
      toStr in class Interval
    • toString

      public String toString()
      Overrides:
      toString in class Marker
    • toStringNoGt

      public String toStringNoGt()
      Show only first eight fields (no genotype entries)
    • uncompressGenotypes

      public VcfEntry uncompressGenotypes()
      Uncompress VCF entry having genotypes in "HO,HE,NA" fields
    • variants

      public List<Variant> variants()
      Create a list of variants from this VcfEntry