Class Contig
Models a segment of a genomic sequence, i.e., a complete genome, plasmid, single contig or scaffold. It extends the Attributes
class to inherit functionality for managing attributes associated with the contig. In addition, an inner map is used to store
Variant
s associated with the contig.
Contigs are stored in the Storage.contigs
property in the model.
-
Field Summary
Fields -
Method Summary
Modifier and TypeMethodDescriptionboolean
Compares this contig to another object for equality.Retrieves all active variants associated with this contig.int
Calculates the total number of active variants associated with this contig.Retrieves all variants associated with this contig.Retrieves the full nucleotide sequence of this contig or an empty string if no sequence is stored.getSequence
(int start, int end) Retrieves a subsequence of this contig, caching the result to optimize performance.int
Retrieves the length of the contig's sequence.getVariant
(int position, String alternative) Retrieves a variant associated with the specified position and alternative base sequence.getVariantsAt
(int... positions) Retrieves variants at the specified positions.int
Calculates the total number of variants associated with this contig.getVariantsOfSamples
(Collection<String> sampleIdentifiers) Retrieves variants associated with the specified sample identifiers.getVariantsOfSamplesWithin
(int start, int end, Collection<String> sampleIdentifiers) Retrieves variants associated with the specified sample identifiers within a given range of positions.getVariantsWithin
(int start, int end) Retrieves variants within the specified range of positions.int
hashCode()
Computes the hash code for this contig.boolean
Checks if this contig has an associated nucleotide sequence.toString()
Returns the string representation of this contig.Methods inherited from class model.Attributes
attributesAsString, attributesAsString, clearAttributes, extendAttribute, extendAttributes, getAttribute, getAttributeOrDefault, getAttributes, getAttributeSet, hasAnyAttribute, hasAttribute, removeAttribute, setAttribute, setAttributeIfAbsent, setAttributes, setAttributesIfAbsent
-
Field Details
-
_id
Unique identifier of this contig.This field serves as the unique identifier for the contig and is used to reference it in the model. This should be at best a database identifier, such as a NCBI accession number.
-
sequenceCache
Cache to store the (sub-)sequence of this contig given a start and end position.This field is a transient
HashMap
used to cache subsequences of the contig's sequence. The keys in the map areTuple
objects representing the start and end positions of the subsequence, and the values are the corresponding subsequences asString
.This is not intended to be serialized, as it is dynamically populated during runtime to optimize performance by avoiding redundant sequence decompression or retrieval.
-
-
Method Details
-
hasSequence
public boolean hasSequence()Checks if this contig has an associated nucleotide sequence.This method determines whether the contig has a stored sequence by checking if the
sequence
field is not empty. A non-empty sequence indicates that the contig has an associated nucleotide sequence.- Returns:
true
if the contig has a sequence orfalse
otherwise.
-
getSequence
Retrieves the full nucleotide sequence of this contig or an empty string if no sequence is stored.This method decompresses the GZIP-compressed sequence stored in the
sequence
field and returns it as a string. If no sequence is stored, it returns an empty string.- Returns:
- The decompressed nucleotide sequence of this contig, or an empty string if no sequence is stored.
- Throws:
IOException
- If an error occurs during the decompression of the sequence.
-
getSequence
Retrieves a subsequence of this contig, caching the result to optimize performance.This method extracts a subsequence from the nucleotide sequence of the contig based on the specified start and end positions. The subsequence is cached to avoid redundant decompression and substring operations for the same range. If the subsequence is already cached, it is retrieved directly from the cache. Otherwise, it is computed, stored in the cache, and returned.
The start and end positions are 1-based indices, meaning the first nucleotide in the sequence is at position 1. If no sequence is stored for the contig, the method returns an empty string.
- Parameters:
start
- The 1-based indexed start position of the subsequence (inclusive).end
- The 1-based indexed end position of the subsequence (exclusive).- Returns:
- The subsequence of this contig, or an empty string if no sequence is stored.
- Throws:
IOException
- If an error occurs during the decompression of the sequence.
-
getSequenceLength
public int getSequenceLength()Retrieves the length of the contig's sequence.This method returns the length of the nucleotide sequence associated with this contig. The length is determined during the initialization of the contig and reflects the number of bases in the sequence. If the contig does not have an associated sequence, the length is 0.
- Returns:
- The length of the contig's sequence as an integer.
-
getVariant
Retrieves a variant associated with the specified position and alternative base sequence.This method checks if the
variants
map contains an entry for the given position. If no entry exists, it returnsnull
. Otherwise, it retrieves theVariant
object associated with the specified alternative base sequence at the given position.- Parameters:
position
- The 1-based position of the variant to retrieve.alternative
- The alternative base sequence of the variant to retrieve.- Returns:
- The
Variant
object associated with the specified position and alternative base sequence, ornull
if no such variant exists.
-
getAllVariants
Retrieves all variants associated with this contig.This method flattens the
variants
map, which organizes variants by their positions, into a single list ofVariant
objects. The returned list is unmodifiable. -
getActiveVariants
Retrieves all active variants associated with this contig.This method flattens the
variants
map, which organizes variants by their positions, into a single list ofVariant
objects. It then filters the list to include only those variants that are marked as active (i.e., have theactive
property set totrue
). The returned list is unmodifiable. -
getVariantsWithin
Retrieves variants within the specified range of positions.This method retrieves variants from the
variants
map that fall within the specified start and end positions (inclusive of start, exclusive of end). The resulting variants are flattened into a single list. The returned list is unmodifiable. -
getVariantsAt
Retrieves variants at the specified positions.This method retrieves variants from the
variants
map that are located at the specified positions. The resulting variants are flattened into a single list. The returned list is unmodifiable. -
getVariantsOfSamples
Retrieves variants associated with the specified sample identifiers.This method filters the variants stored in the contig to include only those that are associated with at least one of the specified sample identifiers. The resulting list is unmodifiable.
-
getVariantsOfSamplesWithin
public List<Variant> getVariantsOfSamplesWithin(int start, int end, Collection<String> sampleIdentifiers) Retrieves variants associated with the specified sample identifiers within a given range of positions.This method filters the variants stored in the contig to include only those that are associated with at least one of the specified sample identifiers and fall within the specified start and end positions (inclusive of start, exclusive of end). The resulting list is unmodifiable.
- Parameters:
start
- The 1-based start position of the range (inclusive).end
- The 1-based end position of the range (exclusive).sampleIdentifiers
- A collection of sample identifiers to filter the variants.- Returns:
- A
List
ofVariant
objects associated with the specified sample identifiers within the given range.
-
getVariantsCount
public int getVariantsCount()Calculates the total number of variants associated with this contig.This method iterates through the
variants
map and sums up the sizes of all the lists of variants. The result represents the total count ofVariant
objects stored in this contig.- Returns:
- The total number of
Variant
objects associated with this contig.
-
getActiveVariantsCount
public int getActiveVariantsCount()Calculates the total number of active variants associated with this contig.This method flattens the
variants
map into a stream ofVariant
objects. It then filters the stream to include only those variants that are marked as active (i.e., have theactive
property set totrue
). The method counts the filtered variants and returns the total count as an integer.- Returns:
- The total number of active
Variant
objects associated with this contig.
-
toString
Returns the string representation of this contig.This method overrides the
toString
method to return the unique identifier of the contig. -
hashCode
public int hashCode()Computes the hash code for this contig.This method overrides the
hashCode
method to compute the hash code based on the unique identifier of the contig. -
equals
Compares this contig to another object for equality.This method overrides the
equals
method to compare the unique identifier of this contig with another object. Two contigs are considered equal if they are of the same class and have the same unique identifier.
-