Interface Allele

All Superinterfaces:
Comparable<Allele>, Serializable
All Known Implementing Classes:
SimpleAllele

public interface Allele extends Comparable<Allele>, Serializable
Immutable representation of an allele.

Types of alleles:

 Ref: a t C g a // C is the reference base
 : a t G g a // C base is a G in some individuals
 : a t - g a // C base is deleted w.r.t. the reference
 : a t CAg a // A base is inserted w.r.t. the reference sequence
 

In these cases, where are the alleles?

  • SNP polymorphism of C/G -> { C , G } -> C is the reference allele
  • 1 base deletion of C -> { tC , t } -> C is the reference allele and we include the preceding reference base (null alleles are not allowed)
  • 1 base insertion of A -> { C ; CA } -> C is the reference allele (because null alleles are not allowed)

Suppose I see a the following in the population:

 Ref: a t C g a // C is the reference base
 : a t G g a // C base is a G in some individuals
 : a t - g a // C base is deleted w.r.t. the reference
 

How do I represent this? There are three segregating alleles:

{ C , G , - }

and these are represented as:

{ tC, tG, t }

Now suppose I have this more complex example:

 Ref: a t C g a // C is the reference base
 : a t - g a
 : a t - - a
 : a t CAg a
 

There are actually four segregating alleles:

{ Cg , -g, --, and CAg } over bases 2-4

represented as:

{ tCg, tg, t, tCAg }

Critically, it should be possible to apply an allele to a reference sequence to create the correct haplotype sequence:

Allele + reference => haplotype

For convenience, we are going to create Alleles where the GenomeLoc of the allele is stored outside of the Allele object itself. So there's an idea of an A/C polymorphism independent of it's surrounding context. Given list of alleles it's possible to determine the "type" of the variation

 A / C @ loc => SNP
 - / A => INDEL
 

If you know where allele is the reference, you can determine whether the variant is an insertion or deletion.

Alelle also supports is concept of a NO_CALL allele. This Allele represents a haplotype that couldn't be determined. This is usually represented by a '.' allele.

Note that Alleles store all bases as bytes, in **UPPER CASE**. So 'atc' == 'ATC' from the perspective of an Allele.

  • Field Details

    • NO_CALL_STRING

      static final String NO_CALL_STRING
      A generic static NO_CALL allele for use
      See Also:
    • SPAN_DEL_STRING

      static final String SPAN_DEL_STRING
      A generic static SPAN_DEL allele for use
      See Also:
    • SINGLE_BREAKEND_INDICATOR

      static final char SINGLE_BREAKEND_INDICATOR
      Non ref allele representations
      See Also:
    • BREAKEND_EXTENDING_RIGHT

      static final char BREAKEND_EXTENDING_RIGHT
      See Also:
    • BREAKEND_EXTENDING_LEFT

      static final char BREAKEND_EXTENDING_LEFT
      See Also:
    • SYMBOLIC_ALLELE_START

      static final char SYMBOLIC_ALLELE_START
      See Also:
    • SYMBOLIC_ALLELE_END

      static final char SYMBOLIC_ALLELE_END
      See Also:
    • NON_REF_STRING

      static final String NON_REF_STRING
      See Also:
    • UNSPECIFIED_ALTERNATE_ALLELE_STRING

      static final String UNSPECIFIED_ALTERNATE_ALLELE_STRING
      See Also:
    • REF_A

      static final Allele REF_A
    • ALT_A

      static final Allele ALT_A
    • REF_C

      static final Allele REF_C
    • ALT_C

      static final Allele ALT_C
    • REF_G

      static final Allele REF_G
    • ALT_G

      static final Allele ALT_G
    • REF_T

      static final Allele REF_T
    • ALT_T

      static final Allele ALT_T
    • REF_N

      static final Allele REF_N
    • ALT_N

      static final Allele ALT_N
    • SPAN_DEL

      static final Allele SPAN_DEL
    • NO_CALL

      static final Allele NO_CALL
    • NON_REF_ALLELE

      static final Allele NON_REF_ALLELE
    • UNSPECIFIED_ALTERNATE_ALLELE

      static final Allele UNSPECIFIED_ALTERNATE_ALLELE
    • SV_SIMPLE_DEL

      static final Allele SV_SIMPLE_DEL
    • SV_SIMPLE_INS

      static final Allele SV_SIMPLE_INS
    • SV_SIMPLE_INV

      static final Allele SV_SIMPLE_INV
    • SV_SIMPLE_CNV

      static final Allele SV_SIMPLE_CNV
    • SV_SIMPLE_DUP

      static final Allele SV_SIMPLE_DUP
  • Method Details

    • create

      static Allele create(byte[] bases, boolean isRef)
      Create a new Allele that includes bases and if tagged as the reference allele if isRef == true. If bases == '-', a Null allele is created. If bases == '.', a no call Allele is created. If bases == '*', a spanning deletions Allele is created.
      Parameters:
      bases - the DNA sequence of this variation, '-', '.', or '*'
      isRef - should we make this a reference allele?
      Throws:
      IllegalArgumentException - if bases contains illegal characters or is otherwise malformated
    • create

      static Allele create(byte base, boolean isRef)
    • create

      static Allele create(byte base)
    • extend

      static Allele extend(Allele left, byte[] right)
    • wouldBeNullAllele

      @Deprecated static boolean wouldBeNullAllele(byte[] bases)
      Deprecated.
      Parameters:
      bases - bases representing an allele
      Returns:
      true if the bases represent the null allele
    • wouldBeStarAllele

      @Deprecated static boolean wouldBeStarAllele(byte[] bases)
      Deprecated.
      Parameters:
      bases - bases representing an allele
      Returns:
      true if the bases represent the SPAN_DEL allele
    • wouldBeNoCallAllele

      @Deprecated static boolean wouldBeNoCallAllele(byte[] bases)
      Deprecated.
      Parameters:
      bases - bases representing an allele
      Returns:
      true if the bases represent the NO_CALL allele
    • wouldBeSymbolicAllele

      @Deprecated static boolean wouldBeSymbolicAllele(byte[] bases)
      Deprecated.
      Parameters:
      bases - bases representing an allele
      Returns:
      true if the bases represent a symbolic allele, including breakpoints and breakends
    • wouldBeBreakpoint

      @Deprecated static boolean wouldBeBreakpoint(byte[] bases)
      Deprecated.
      Parameters:
      bases - bases representing an allele
      Returns:
      true if the bases represent a symbolic allele in breakpoint notation, (ex: G]17:198982] or ]13:123456]T )
    • wouldBeSingleBreakend

      @Deprecated static boolean wouldBeSingleBreakend(byte[] bases)
      Deprecated.
      Parameters:
      bases - bases representing an allele
      Returns:
      true if the bases represent a symbolic allele in single breakend notation (ex: .A or A. )
    • acceptableAlleleBases

      static boolean acceptableAlleleBases(String bases)
      Parameters:
      bases - bases representing a reference allele
      Returns:
      true if the bases represent the well formatted allele
    • acceptableAlleleBases

      static boolean acceptableAlleleBases(String bases, boolean isReferenceAllele)
      Parameters:
      bases - bases representing an allele
      isReferenceAllele - is a reference allele
      Returns:
      true if the bases represent the well formatted allele
    • acceptableAlleleBases

      static boolean acceptableAlleleBases(byte[] bases)
      Parameters:
      bases - bases representing a reference allele
      Returns:
      true if the bases represent the well formatted allele
    • acceptableAlleleBases

      static boolean acceptableAlleleBases(byte[] bases, boolean isReferenceAllele)
      Parameters:
      bases - bases representing an allele
      isReferenceAllele - true if a reference allele
      Returns:
      true if the bases represent the well formatted allele
    • create

      static Allele create(String bases, boolean isRef)
      Returns an allele with the given bases and reference status.
      Parameters:
      bases - bases representing an allele
      isRef - is this the reference allele?
    • create

      static Allele create(String bases)
      Creates a non-Ref allele. @see Allele(byte[], boolean) for full information
      Parameters:
      bases - bases representing an allele
    • create

      static Allele create(byte[] bases)
      Creates a non-Ref allele. @see Allele(byte[], boolean) for full information
      Parameters:
      bases - bases representing an allele
    • create

      static Allele create(Allele allele, boolean ignoreRefState)
      Creates a new allele based on the provided one. Ref state will be copied unless ignoreRefState is true (in which case the returned allele will be non-Ref). This method is efficient because it can skip the validation of the bases (since the original allele was already validated)
      Parameters:
      allele - the allele from which to copy the bases
      ignoreRefState - should we ignore the reference state of the input allele and use the default ref state?
    • oneIsPrefixOfOther

      static boolean oneIsPrefixOfOther(Allele a1, Allele a2)
    • isPrefixOf

      boolean isPrefixOf(Allele other)
    • isNoCall

      boolean isNoCall()
      Returns:
      true if this is the NO_CALL allele
    • isCalled

      boolean isCalled()
    • isReference

      boolean isReference()
      Returns:
      true if this Allele is the reference allele
    • isNonReference

      boolean isNonReference()
      Returns:
      true if this Allele is not the reference allele
    • isSymbolic

      boolean isSymbolic()
      Returns:
      true if this Allele is symbolic (i.e. no well-defined base sequence), this includes breakpoints and breakends
    • isBreakpoint

      boolean isBreakpoint()
      Returns:
      true if this Allele is a breakpoint ( ex: G]17:198982] or ]13:123456]T )
    • isSingleBreakend

      boolean isSingleBreakend()
      Returns:
      true if this Allele is a single breakend (ex: .A or A.)
    • toString

      String toString()
      Overrides:
      toString in class Object
    • getBases

      byte[] getBases()
    • getBaseString

      String getBaseString()
    • getDisplayString

      String getDisplayString()
    • getDisplayBases

      byte[] getDisplayBases()
    • equals

      boolean equals(Object other)
      Overrides:
      equals in class Object
    • hashCode

      int hashCode()
      Overrides:
      hashCode in class Object
    • equals

      boolean equals(Allele other, boolean ignoreRefState)
    • basesMatch

      boolean basesMatch(byte[] test)
    • basesMatch

      boolean basesMatch(String test)
    • basesMatch

      boolean basesMatch(Allele test)
    • length

      int length()
    • isNonRefAllele

      boolean isNonRefAllele()
      Returns:
      true if Allele is either <NON_REF> or <*>