Class Feature
This class models a genomic feature, such as a gene, exon, or coding sequence (CDS), that is analyzed in the context of genomic data
processing. It extends the Attributes class to inherit functionality for managing attributes associated with the feature. Each
instance contains information about its type and location on the reference genome - i.e., a Contig. In addition, an inner map is
used to store SequenceTypes associated with the feature.
Features are stored in the Storage.features property of the model.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final recordRepresents a sub-feature of a genomic feature. -
Field Summary
FieldsModifier and TypeFieldDescriptionfinal StringUnique identifier of this feature.final StringThe parent genomic location of this feature.final int1-based end position of the feature.final StringAn additional (human-readable) name for this feature.final int1-based starting position of the feature.final charStrand of the feature.final StringThe type of this genomic feature. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionvoidAdds an allele to this feature.voidaddProteoform(Proteoform proteoform) Adds a proteoform to this feature.voidaddSubFeature(String type, int start, int end) Adds a sub-feature to this genomic feature.voidClears all sub-features associated with this genomic feature.voidclearSubFeatures(int level) Clears all sub-features of a specific Sequence Ontology (SO) hierarchy level associated with this genomic feature.booleanCompares this feature to another object for equality.Retrieves an allele associated with this feature by its unique identifier (_id) ornull.intRetrieves the number of alleles associated with this feature.Retrieves all alleles associated with this feature.getProteoform(String proteoformIdentifier) Retrieves a proteoform associated with this feature by its unique identifier ornull.intRetrieves the number of proteoforms associated with this feature.Retrieves all proteoforms associated with this feature.Retrieves all sub-features associated with this genomic feature.booleanChecks if an allele with the specified identifier exists in this feature.inthashCode()Computes the hash code for this feature.booleanhasProteoform(String proteoformIdentifier) Checks if a proteoform with the specified identifier exists in this feature.booleanisCoding()Determines if this feature is a coding feature.booleanReturns if this feature is on the reverse strand.toString()Generates a string representation of this feature.Methods inherited from class model.Attributes
attributesAsString, attributesAsString, clearAttributes, extendAttribute, extendAttributes, getAttribute, getAttributeOrDefault, getAttributes, getAttributeSet, hasAnyAttribute, hasAttribute, removeAttribute, setAttribute, setAttributeIfAbsent, setAttributes, setAttributesIfAbsent
-
Field Details
-
_id
Unique identifier of this feature.This field serves as the unique identifier for the feature and is used to reference it in the model.
-
type
The type of this genomic feature.This field specifies the type of the feature, such as "gene", "exon", or "CDS". The type is defined according to the GFF3 specification and provides information about the biological or functional classification of the feature.
For more details, refer to the GFF3 specification: https://gmod.org/wiki/GFF3.
-
name
An additional (human-readable) name for this feature.This should be at best a database identifier or common gene name.
-
contig
The parent genomic location of this feature. -
start
public final int start1-based starting position of the feature. -
end
public final int end1-based end position of the feature. -
strand
public final char strandStrand of the feature.
-
-
Constructor Details
-
Feature
public Feature(String name, String contig, Number start, Number end, char strand, String type, String identifier) Constructs a newFeatureinstance with the specified properties.This constructor initializes a genomic feature with its id, location, strand orientation, type, and unique identifier. The feature's start and end positions are converted to integers to ensure proper indexing. The
Attributessuperclass is also initialized.- Parameters:
name- The id of the feature, used as its internal identifier.contig- The id of the reference location (e.g., contig, chromosome, plasmid) where the feature is located.start- The 1-based indexed starting position of the feature on the reference.end- The 1-based indexed end position of the feature on the reference.strand- The strand orientation of the feature ('+' for forward strand, '-' for reverse strand).type- The type of the feature (e.g., coding, non-coding).identifier- The unique identifier of the feature.
-
-
Method Details
-
isCoding
public boolean isCoding()Determines if this feature is a coding feature.This method checks whether the feature is of type "CDS" (coding sequence) or if any of its sub-features are of type "CDS". A feature is considered coding if it directly represents a coding sequence or contains sub-features that do.
- Returns:
trueif the feature is of type "CDS" or has sub-features of type "CDS";falseotherwise.
-
isReverse
public boolean isReverse()Returns if this feature is on the reverse strand.This method determines whether the strand orientation of the feature is reverse by checking if the strand character is
'-'.- Returns:
trueif this feature is on the reverse strand,falseotherwise.
-
addSubFeature
Adds a sub-feature to this genomic feature.This method validates and adds a sub-feature to the list of sub-features associated with this genomic feature. The sub-feature is defined by its type, start position, and end position. Validation ensures that:
- The sub-feature type is recognized in the
Storage.SEQUENCE_ONTOLOGY_HIERARCHYmap. - The sub-feature's start and end positions are within the bounds of the parent feature.
IllegalArgumentExceptionis thrown.- Parameters:
type- The type of the sub-feature (e.g., "exon", "CDS").start- The 1-based start position of the sub-feature.end- The 1-based end position of the sub-feature.- Throws:
MusialException- if the sub-feature type is unrecognized or if its positions are out of bounds.
- The sub-feature type is recognized in the
-
getSubFeatures
Retrieves all sub-features associated with this genomic feature.This method provides an unmodifiable view of the list of sub-features associated with this genomic feature. Sub-features represent smaller components of the feature, such as exons or coding sequences (CDS), and include their type and genomic location (start and end positions).
The unmodifiable list ensures that the original list cannot be modified externally, preserving data integrity.
- Returns:
- An unmodifiable
ListofFeature.SubFeatureobjects representing the sub-features of this genomic feature.
-
clearSubFeatures
public void clearSubFeatures()Clears all sub-features associated with this genomic feature.This method removes all sub-features from the list of sub-features associated with this genomic feature. After calling this method, the list of sub-features will be empty.
-
clearSubFeatures
public void clearSubFeatures(int level) Clears all sub-features of a specific Sequence Ontology (SO) hierarchy level associated with this genomic feature.This method removes all sub-features from the list of sub-features that match the specified SO hierarchy level. The hierarchy level is determined using the
Storage.SEQUENCE_ONTOLOGY_HIERARCHYmap.After calling this method, only sub-features that do not match the specified level will remain in the list.
- Parameters:
level- The SO hierarchy level of the sub-features to remove (e.g., 0 for "region", 1 for "gene").
-
hasAllele
Checks if an allele with the specified identifier exists in this feature.- Parameters:
alleleIdentifier- The identifier of the allele to check for.- Returns:
trueif an allele with the given identifier exists,falseotherwise.
-
addAllele
Adds an allele to this feature.This method adds the specified
Alleleobject to the internal map of alleles associated with this feature. The allele is stored using its unique identifier as the key.Note: No internal validation is performed based on the coordinates of the feature.
- Parameters:
allele- TheAlleleobject to be added to this feature.
-
getAllele
Retrieves an allele associated with this feature by its unique identifier (_id) ornull.- Parameters:
alleleIdentifier- The identifier of the allele to retrieve.- Returns:
- The
Alleleobject associated with the given UID ornullif not found.
-
getAlleles
Retrieves all alleles associated with this feature.This method provides an unmodifiable view of the collection of alleles associated with this feature. The alleles are stored as values in the internal map, ensuring that the original collection cannot be modified externally.
- Returns:
- An unmodifiable
CollectionofAlleleobjects associated with this feature.
-
getAlleleCount
public int getAlleleCount()Retrieves the number of alleles associated with this feature.This method returns the size of the internal map of alleles, which represents the total number of unique alleles associated with this feature.
- Returns:
- The number of alleles associated with this feature.
-
hasProteoform
Checks if a proteoform with the specified identifier exists in this feature.- Parameters:
proteoformIdentifier- The identifier of the proteoform to check for.- Returns:
trueif a proteoform with the given identifier exists,falseotherwise.
-
addProteoform
Adds a proteoform to this feature.This method adds the specified
Proteoformobject to the internal map of proteoforms associated with this feature. The proteoform is stored using its unique identifier as the key.Note: No internal validation is performed based on the coordinates of the feature.
- Parameters:
proteoform- TheProteoformobject to be added to this feature.
-
getProteoform
Retrieves a proteoform associated with this feature by its unique identifier ornull.- Parameters:
proteoformIdentifier- The unique identifier of the proteoform to retrieve.- Returns:
- The
Proteoformobject associated with the given UID ornullif not found.
-
getProteoforms
Retrieves all proteoforms associated with this feature.This method provides an unmodifiable view of the collection of proteoforms associated with this feature. Proteoforms represent specific sequence variants of proteins derived from the feature. The unmodifiable collection ensures that the original data cannot be modified externally.
- Returns:
- An unmodifiable
CollectionofProteoformobjects associated with this feature.
-
getProteoformCount
public int getProteoformCount()Retrieves the number of proteoforms associated with this feature.This method returns the size of the internal map of proteoforms, which represents the total number of unique proteoforms associated with this feature.
- Returns:
- The number of proteoforms associated with this feature.
-
toString
Generates a string representation of this feature.This method constructs a string representation of the feature using its contig, genomic coordinates, and name. The format includes the contig identifier, start and end positions, and the feature name, separated by specific constants.
-
hashCode
public int hashCode()Computes the hash code for this feature.This method calculates the hash code of the feature based on its string representation.
-
equals
Compares this feature to another object for equality.This method checks if the provided object is the same instance as this feature. If not, it verifies that the object is of the same class and compares their string representations for equality.
-