Package datastructure

Class VariantInformation

java.lang.Object
datastructure.Attributable
datastructure.VariantInformation

public class VariantInformation extends Attributable
Stores information associated with a nucleotide variant.

The actual alternative content is not stored in this class!

This class represents a nucleotide variant, including its reference base content, type (e.g., SNV, insertion, deletion), and occurrences in samples and alleles. It provides methods to determine the type of the variant, check its canonical or padded canonical status, and manage occurrences in samples and features.

  • Field Details

    • reference

      public final String reference
      The reference base content of this variant.
    • occurrence

      protected final HashMap<String,HashSet<String>> occurrence
      A mapping of occurrences of this variant in samples and alleles.

      The `occurrence` map is structured as follows:

      This structure allows efficient tracking of where the variant occurs in terms of samples and features.
    • type

      public final VariantInformation.Type type
      The type of this variant (e.g., SNV, insertion, deletion).
  • Constructor Details

    • VariantInformation

      protected VariantInformation(String referenceContent, String alternativeContent)
      Constructor for VariantInformation.

      Initializes the variant with its reference and alternative content, determining its type (SNV, insertion, or deletion) based on the provided content.

      This constructor checks if the reference and alternative content match any padded canonical content type. If they do not, an IllegalArgumentException is thrown.

      Parameters:
      referenceContent - The reference base content of the variant.
      alternativeContent - The alternative base content of the variant.
      Throws:
      IllegalArgumentException - If the reference and alternative content do not match any padded canonical content type.
  • Method Details

    • isSubstitution

      public static boolean isSubstitution(String ref, String alt)
      Determines whether a variant is a substitution; i.e., both the reference and alternative base content match a single base of Constants.baseSymbols.
      Parameters:
      ref - The reference base content.
      alt - The alternative base content.
      Returns:
      true if the variant is a substitution, false otherwise.
    • isSubstitution

      public static boolean isSubstitution(String alt)
      Determines whether a given alternative base content represents a substitution.

      A substitution is defined as a single base from the set of valid nucleotide symbols defined in Constants.baseSymbols.

      Parameters:
      alt - The alternative base content to check.
      Returns:
      true if the alternative content represents a substitution, false otherwise.
    • isInsertion

      public static boolean isInsertion(String ref, String alt, boolean padded)
      Determines whether a variant is an insertion, i.e.,
      Parameters:
      ref - The reference base content.
      alt - The alternative base content.
      padded - Whether the variant is padded by gap symbols.
      Returns:
      true if the variant is an insertion, false otherwise.
    • isInsertion

      public static boolean isInsertion(String alt)
      Determines whether a variant is an insertion based on its alternative content.

      This method checks if the alternative base content represents an insertion. An insertion is defined as a string of at least two consecutive bases from the set of valid nucleotide symbols defined in Constants.baseSymbols.

      Parameters:
      alt - The alternative base content to check.
      Returns:
      true if the alternative content represents an insertion, false otherwise.
    • isDeletion

      public static boolean isDeletion(String ref, String alt, boolean padded)
      Determines whether a variant is a deletion, i.e.,
      Parameters:
      ref - The reference base content.
      alt - The alternative base content.
      padded - Whether the variant is padded by gap symbols.
      Returns:
      true if the variant is a deletion, false otherwise.
    • isDeletion

      public static boolean isDeletion(String alt)
      Determines whether a variant is a deletion based on its alternative content.

      This method checks if the alternative base content represents a deletion. A deletion is defined as a string that starts with a valid nucleotide base (from Constants.baseSymbols) followed by one or more gap symbols (defined in Constants.gapString).

      Parameters:
      alt - The alternative base content to check.
      Returns:
      true if the alternative content represents a deletion, false otherwise.
    • isCanonicalVariant

      public static boolean isCanonicalVariant(String referenceContent, String alternativeContent)
      Determines whether a variant is canonical.

      A variant is canonical if it is:

      Parameters:
      referenceContent - The reference base content.
      alternativeContent - The alternative base content.
      Returns:
      true if the variant is canonical, false otherwise.
    • isPaddedCanonicalVariant

      public static boolean isPaddedCanonicalVariant(String referenceContent, String alternativeContent)
      Determines whether a variant is padded canonical.

      A variant is padded canonical if it is:

      Parameters:
      referenceContent - The reference base content.
      alternativeContent - The alternative base content.
      Returns:
      true if the variant is padded canonical, false otherwise.
    • addSampleOccurrence

      protected void addSampleOccurrence(String name)
      Adds a sample occurrence to this variant.
      Parameters:
      name - The name of the sample to add.
    • addFeatureOccurrence

      protected void addFeatureOccurrence(String name)
      Adds a feature occurrence to this variant.
      Parameters:
      name - The name of the feature to add.
    • addAlleleOccurrence

      protected void addAlleleOccurrence(String featureName, String alleleUid)
      Adds an allele occurrence to this variant for a specific feature.
      Parameters:
      featureName - The name of the feature.
      alleleUid - The unique identifier of the allele to add.
    • hasOccurrence

      public boolean hasOccurrence(String of, String name)
      Checks whether this variant has an occurrence in a sample or allele.
      Parameters:
      of - Either samples or the name of a Feature.
      name - The name of the sample or allele to check for.
      Returns:
      true if the sample or allele is associated with this variant, false otherwise.
    • hasOccurrence

      public boolean hasOccurrence(String name)
      Checks whether this variant has an occurrence in a specific feature.
      Parameters:
      name - The name of the feature to check for.
      Returns:
      true if the feature is associated with this variant, false otherwise.
    • getSampleOccurrence

      public Collection<String> getSampleOccurrence()
      Retrieves the occurrences of this variant in samples.
      Returns:
      A Collection of sample names.
    • getFeatureOccurrence

      public Collection<String> getFeatureOccurrence()
      Retrieves the features associated with this variant.
      Returns:
      A Collection of feature names.
    • getReferenceBaseString

      public String getReferenceBaseString(boolean strip)
      Retrieves the reference base content of this variant.
      Parameters:
      strip - Whether to strip gap symbols from the reference base content.
      Returns:
      The reference base content of this variant.