Package model

Class Contig


public class Contig extends Attributes
Represents a reference genomic location.

Models a segment of a genomic sequence, i.e., a complete genome, plasmid, single contig or scaffold. It extends the Attributes class to inherit functionality for managing attributes associated with the contig. In addition, an inner map is used to store Variants associated with the contig.

Contigs are stored in the Storage.contigs property in the model.

  • Field Details

    • _id

      public final String _id
      Unique identifier of this contig.

      This field serves as the unique identifier for the contig and is used to reference it in the model. This should be at best a database identifier, such as a NCBI accession number.

    • sequenceCache

      protected transient Map<htsjdk.samtools.util.Tuple<Integer,Integer>,String> sequenceCache
      Cache to store the (sub-)sequence of this contig given a start and end position.

      This field is a transient HashMap used to cache subsequences of the contig's sequence. The keys in the map are Tuple objects representing the start and end positions of the subsequence, and the values are the corresponding subsequences as String.

      This is not intended to be serialized, as it is dynamically populated during runtime to optimize performance by avoiding redundant sequence decompression or retrieval.

  • Method Details

    • hasSequence

      public boolean hasSequence()
      Checks if this contig has an associated nucleotide sequence.

      This method determines whether the contig has a stored sequence by checking if the sequence field is not empty. A non-empty sequence indicates that the contig has an associated nucleotide sequence.

      Returns:
      true if the contig has a sequence or false otherwise.
    • getSequence

      public String getSequence() throws IOException
      Retrieves the full nucleotide sequence of this contig or an empty string if no sequence is stored.

      This method decompresses the GZIP-compressed sequence stored in the sequence field and returns it as a string. If no sequence is stored, it returns an empty string.

      Returns:
      The decompressed nucleotide sequence of this contig, or an empty string if no sequence is stored.
      Throws:
      IOException - If an error occurs during the decompression of the sequence.
    • getSequence

      public String getSequence(int start, int end) throws IOException
      Retrieves a subsequence of this contig, caching the result to optimize performance.

      This method extracts a subsequence from the nucleotide sequence of the contig based on the specified start and end positions. The subsequence is cached to avoid redundant decompression and substring operations for the same range. If the subsequence is already cached, it is retrieved directly from the cache. Otherwise, it is computed, stored in the cache, and returned.

      The start and end positions are 1-based indices, meaning the first nucleotide in the sequence is at position 1. If no sequence is stored for the contig, the method returns an empty string.

      Parameters:
      start - The 1-based indexed start position of the subsequence (inclusive).
      end - The 1-based indexed end position of the subsequence (exclusive).
      Returns:
      The subsequence of this contig, or an empty string if no sequence is stored.
      Throws:
      IOException - If an error occurs during the decompression of the sequence.
    • getSequenceLength

      public int getSequenceLength()
      Retrieves the length of the contig's sequence.

      This method returns the length of the nucleotide sequence associated with this contig. The length is determined during the initialization of the contig and reflects the number of bases in the sequence. If the contig does not have an associated sequence, the length is 0.

      Returns:
      The length of the contig's sequence as an integer.
    • getVariant

      public Variant getVariant(int position, String alternative)
      Retrieves a variant associated with the specified position and alternative base sequence.

      This method checks if the variants map contains an entry for the given position. If no entry exists, it returns null. Otherwise, it retrieves the Variant object associated with the specified alternative base sequence at the given position.

      Parameters:
      position - The 1-based position of the variant to retrieve.
      alternative - The alternative base sequence of the variant to retrieve.
      Returns:
      The Variant object associated with the specified position and alternative base sequence, or null if no such variant exists.
    • getAllVariants

      public List<Variant> getAllVariants()
      Retrieves all variants associated with this contig.

      This method flattens the variants map, which organizes variants by their positions, into a single list of Variant objects. The returned list is unmodifiable.

      Returns:
      A List containing all Variant objects associated with this contig.
    • getActiveVariants

      public List<Variant> getActiveVariants()
      Retrieves all active variants associated with this contig.

      This method flattens the variants map, which organizes variants by their positions, into a single list of Variant objects. It then filters the list to include only those variants that are marked as active (i.e., have the active property set to true). The returned list is unmodifiable.

      Returns:
      A List containing all active Variant objects associated with this contig.
    • getVariantsWithin

      public List<Variant> getVariantsWithin(int start, int end)
      Retrieves variants within the specified range of positions.

      This method retrieves variants from the variants map that fall within the specified start and end positions (inclusive of start, exclusive of end). The resulting variants are flattened into a single list. The returned list is unmodifiable.

      Parameters:
      start - The 1-based start position of the range (inclusive).
      end - The 1-based end position of the range (exclusive).
      Returns:
      A List of Variant objects within the specified range.
    • getVariantsAt

      public List<Variant> getVariantsAt(int... positions)
      Retrieves variants at the specified positions.

      This method retrieves variants from the variants map that are located at the specified positions. The resulting variants are flattened into a single list. The returned list is unmodifiable.

      Parameters:
      positions - An array of 1-based positions to retrieve variants from.
      Returns:
      A List of Variant objects located at the specified positions.
    • getVariantsOfSamples

      public List<Variant> getVariantsOfSamples(Collection<String> sampleIdentifiers)
      Retrieves variants associated with the specified sample identifiers.

      This method filters the variants stored in the contig to include only those that are associated with at least one of the specified sample identifiers. The resulting list is unmodifiable.

      Parameters:
      sampleIdentifiers - A collection of sample identifiers to filter the variants.
      Returns:
      A List of Variant objects associated with the specified sample identifiers.
    • getVariantsOfSamplesWithin

      public List<Variant> getVariantsOfSamplesWithin(int start, int end, Collection<String> sampleIdentifiers)
      Retrieves variants associated with the specified sample identifiers within a given range of positions.

      This method filters the variants stored in the contig to include only those that are associated with at least one of the specified sample identifiers and fall within the specified start and end positions (inclusive of start, exclusive of end). The resulting list is unmodifiable.

      Parameters:
      start - The 1-based start position of the range (inclusive).
      end - The 1-based end position of the range (exclusive).
      sampleIdentifiers - A collection of sample identifiers to filter the variants.
      Returns:
      A List of Variant objects associated with the specified sample identifiers within the given range.
    • getVariantsCount

      public int getVariantsCount()
      Calculates the total number of variants associated with this contig.

      This method iterates through the variants map and sums up the sizes of all the lists of variants. The result represents the total count of Variant objects stored in this contig.

      Returns:
      The total number of Variant objects associated with this contig.
    • getActiveVariantsCount

      public int getActiveVariantsCount()
      Calculates the total number of active variants associated with this contig.

      This method flattens the variants map into a stream of Variant objects. It then filters the stream to include only those variants that are marked as active (i.e., have the active property set to true). The method counts the filtered variants and returns the total count as an integer.

      Returns:
      The total number of active Variant objects associated with this contig.
    • toString

      public String toString()
      Returns the string representation of this contig.

      This method overrides the toString method to return the unique identifier of the contig.

      Overrides:
      toString in class Object
      Returns:
      The unique identifier of this contig as a String.
    • hashCode

      public int hashCode()
      Computes the hash code for this contig.

      This method overrides the hashCode method to compute the hash code based on the unique identifier of the contig.

      Overrides:
      hashCode in class Object
      Returns:
      The hash code of this contig.
    • equals

      public boolean equals(Object obj)
      Compares this contig to another object for equality.

      This method overrides the equals method to compare the unique identifier of this contig with another object. Two contigs are considered equal if they are of the same class and have the same unique identifier.

      Overrides:
      equals in class Object
      Parameters:
      obj - The object to compare with this contig.
      Returns:
      true if the objects are equal; false otherwise.