Class Feature
This class models a genomic feature, such as a gene, exon, or coding sequence (CDS), that is analyzed in the context of genomic data
processing. It extends the Attributes
class to inherit functionality for managing attributes associated with the feature. Each
instance contains information about its type and location on the reference genome - i.e., a Contig
. In addition, an inner map is
used to store SequenceType
s associated with the feature.
Features are stored in the Storage.features
property of the model.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final record
Represents a sub-feature of a genomic feature. -
Field Summary
FieldsModifier and TypeFieldDescriptionfinal String
Unique identifier of this feature.final String
The parent genomic location of this feature.final int
1-based end position of the feature.final String
An additional (human-readable) name for this feature.final int
1-based starting position of the feature.final char
Strand of the feature.final String
The type of this genomic feature. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoid
Adds an allele to this feature.void
addProteoform
(Proteoform proteoform) Adds a proteoform to this feature.void
addSubFeature
(String type, int start, int end) Adds a sub-feature to this genomic feature.void
Clears all sub-features associated with this genomic feature.void
clearSubFeatures
(int level) Clears all sub-features of a specific Sequence Ontology (SO) hierarchy level associated with this genomic feature.boolean
Compares this feature to another object for equality.Retrieves an allele associated with this feature by its unique identifier (_id) ornull
.int
Retrieves the number of alleles associated with this feature.Retrieves all alleles associated with this feature.getProteoform
(String proteoformIdentifier) Retrieves a proteoform associated with this feature by its unique identifier ornull
.int
Retrieves the number of proteoforms associated with this feature.Retrieves all proteoforms associated with this feature.Retrieves all sub-features associated with this genomic feature.boolean
Checks if an allele with the specified identifier exists in this feature.int
hashCode()
Computes the hash code for this feature.boolean
hasProteoform
(String proteoformIdentifier) Checks if a proteoform with the specified identifier exists in this feature.boolean
isCoding()
Determines if this feature is a coding feature.boolean
Returns if this feature is on the reverse strand.toString()
Generates a string representation of this feature.Methods inherited from class model.Attributes
attributesAsString, attributesAsString, clearAttributes, extendAttribute, extendAttributes, getAttribute, getAttributeOrDefault, getAttributes, getAttributeSet, hasAnyAttribute, hasAttribute, removeAttribute, setAttribute, setAttributeIfAbsent, setAttributes, setAttributesIfAbsent
-
Field Details
-
_id
Unique identifier of this feature.This field serves as the unique identifier for the feature and is used to reference it in the model.
-
type
The type of this genomic feature.This field specifies the type of the feature, such as "gene", "exon", or "CDS". The type is defined according to the GFF3 specification and provides information about the biological or functional classification of the feature.
For more details, refer to the GFF3 specification: https://gmod.org/wiki/GFF3.
-
name
An additional (human-readable) name for this feature.This should be at best a database identifier or common gene name.
-
contig
The parent genomic location of this feature. -
start
public final int start1-based starting position of the feature. -
end
public final int end1-based end position of the feature. -
strand
public final char strandStrand of the feature.
-
-
Constructor Details
-
Feature
public Feature(String name, String contig, Number start, Number end, char strand, String type, String identifier) Constructs a newFeature
instance with the specified properties.This constructor initializes a genomic feature with its id, location, strand orientation, type, and unique identifier. The feature's start and end positions are converted to integers to ensure proper indexing. The
Attributes
superclass is also initialized.- Parameters:
name
- The id of the feature, used as its internal identifier.contig
- The id of the reference location (e.g., contig, chromosome, plasmid) where the feature is located.start
- The 1-based indexed starting position of the feature on the reference.end
- The 1-based indexed end position of the feature on the reference.strand
- The strand orientation of the feature ('+' for forward strand, '-' for reverse strand).type
- The type of the feature (e.g., coding, non-coding).identifier
- The unique identifier of the feature.
-
-
Method Details
-
isCoding
public boolean isCoding()Determines if this feature is a coding feature.This method checks whether the feature is of type "CDS" (coding sequence) or if any of its sub-features are of type "CDS". A feature is considered coding if it directly represents a coding sequence or contains sub-features that do.
- Returns:
true
if the feature is of type "CDS" or has sub-features of type "CDS";false
otherwise.
-
isReverse
public boolean isReverse()Returns if this feature is on the reverse strand.This method determines whether the strand orientation of the feature is reverse by checking if the strand character is
'-'
.- Returns:
true
if this feature is on the reverse strand,false
otherwise.
-
addSubFeature
Adds a sub-feature to this genomic feature.This method validates and adds a sub-feature to the list of sub-features associated with this genomic feature. The sub-feature is defined by its type, start position, and end position. Validation ensures that:
- The sub-feature type is recognized in the
Storage.SEQUENCE_ONTOLOGY_HIERARCHY
map. - The sub-feature's start and end positions are within the bounds of the parent feature.
IllegalArgumentException
is thrown.- Parameters:
type
- The type of the sub-feature (e.g., "exon", "CDS").start
- The 1-based start position of the sub-feature.end
- The 1-based end position of the sub-feature.- Throws:
MusialException
- if the sub-feature type is unrecognized or if its positions are out of bounds.
- The sub-feature type is recognized in the
-
getSubFeatures
Retrieves all sub-features associated with this genomic feature.This method provides an unmodifiable view of the list of sub-features associated with this genomic feature. Sub-features represent smaller components of the feature, such as exons or coding sequences (CDS), and include their type and genomic location (start and end positions).
The unmodifiable list ensures that the original list cannot be modified externally, preserving data integrity.
- Returns:
- An unmodifiable
List
ofFeature.SubFeature
objects representing the sub-features of this genomic feature.
-
clearSubFeatures
public void clearSubFeatures()Clears all sub-features associated with this genomic feature.This method removes all sub-features from the list of sub-features associated with this genomic feature. After calling this method, the list of sub-features will be empty.
-
clearSubFeatures
public void clearSubFeatures(int level) Clears all sub-features of a specific Sequence Ontology (SO) hierarchy level associated with this genomic feature.This method removes all sub-features from the list of sub-features that match the specified SO hierarchy level. The hierarchy level is determined using the
Storage.SEQUENCE_ONTOLOGY_HIERARCHY
map.After calling this method, only sub-features that do not match the specified level will remain in the list.
- Parameters:
level
- The SO hierarchy level of the sub-features to remove (e.g., 0 for "region", 1 for "gene").
-
hasAllele
Checks if an allele with the specified identifier exists in this feature.- Parameters:
alleleIdentifier
- The identifier of the allele to check for.- Returns:
true
if an allele with the given identifier exists,false
otherwise.
-
addAllele
Adds an allele to this feature.This method adds the specified
Allele
object to the internal map of alleles associated with this feature. The allele is stored using its unique identifier as the key.Note: No internal validation is performed based on the coordinates of the feature.
- Parameters:
allele
- TheAllele
object to be added to this feature.
-
getAllele
Retrieves an allele associated with this feature by its unique identifier (_id) ornull
.- Parameters:
alleleIdentifier
- The identifier of the allele to retrieve.- Returns:
- The
Allele
object associated with the given UID ornull
if not found.
-
getAlleles
Retrieves all alleles associated with this feature.This method provides an unmodifiable view of the collection of alleles associated with this feature. The alleles are stored as values in the internal map, ensuring that the original collection cannot be modified externally.
- Returns:
- An unmodifiable
Collection
ofAllele
objects associated with this feature.
-
getAlleleCount
public int getAlleleCount()Retrieves the number of alleles associated with this feature.This method returns the size of the internal map of alleles, which represents the total number of unique alleles associated with this feature.
- Returns:
- The number of alleles associated with this feature.
-
hasProteoform
Checks if a proteoform with the specified identifier exists in this feature.- Parameters:
proteoformIdentifier
- The identifier of the proteoform to check for.- Returns:
true
if a proteoform with the given identifier exists,false
otherwise.
-
addProteoform
Adds a proteoform to this feature.This method adds the specified
Proteoform
object to the internal map of proteoforms associated with this feature. The proteoform is stored using its unique identifier as the key.Note: No internal validation is performed based on the coordinates of the feature.
- Parameters:
proteoform
- TheProteoform
object to be added to this feature.
-
getProteoform
Retrieves a proteoform associated with this feature by its unique identifier ornull
.- Parameters:
proteoformIdentifier
- The unique identifier of the proteoform to retrieve.- Returns:
- The
Proteoform
object associated with the given UID ornull
if not found.
-
getProteoforms
Retrieves all proteoforms associated with this feature.This method provides an unmodifiable view of the collection of proteoforms associated with this feature. Proteoforms represent specific sequence variants of proteins derived from the feature. The unmodifiable collection ensures that the original data cannot be modified externally.
- Returns:
- An unmodifiable
Collection
ofProteoform
objects associated with this feature.
-
getProteoformCount
public int getProteoformCount()Retrieves the number of proteoforms associated with this feature.This method returns the size of the internal map of proteoforms, which represents the total number of unique proteoforms associated with this feature.
- Returns:
- The number of proteoforms associated with this feature.
-
toString
Generates a string representation of this feature.This method constructs a string representation of the feature using its contig, genomic coordinates, and name. The format includes the contig identifier, start and end positions, and the feature name, separated by specific constants.
-
hashCode
public int hashCode()Computes the hash code for this feature.This method calculates the hash code of the feature based on its string representation.
-
equals
Compares this feature to another object for equality.This method checks if the provided object is the same instance as this feature. If not, it verifies that the object is of the same class and compares their string representations for equality.
-