Class Feature
This class models a genomic feature, such as a gene, exon, or coding sequence (CDS),
that is analyzed in the context of genomic data processing. It extends the Attributable
class to inherit functionality for managing attributes associated with the feature.
Each instance of this class is uniquely identified by its name
and contains
information about its type, location on the reference genome, and other relevant properties.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionclass
Represents an allele associated with a genomic feature.class
Represents a proteoform associated with a genomic feature. -
Field Summary
FieldsModifier and TypeFieldDescriptionfinal String
Unique identifier of this feature.protected final HashMap
<String, Feature.Allele> Alleles (SequenceType
instances) associated with this feature.final String
The location of the feature on the reference, i.e., the contig/chromosome/plasmid.final int
1-based indexed end position of the feature.final String
The name or internal identifier of this genomic feature.protected final HashMap
<String, Feature.Proteoform> Proteoforms (SequenceType
instances) associated with this feature.final int
1-based indexed starting position of the feature.final char
Strand of the feature.final String
The type of this genomic feature. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionRetrieves an allele associated with this feature by its unique identifier.Retrieves all alleles associated with this feature.Retrieves a sorted map of child features associated with this feature.getProteoform
(String uid) Retrieves a proteoform associated with this feature by its unique identifier.Retrieves all proteoforms associated with this feature.boolean
isCoding()
Determines if this feature is a coding feature.boolean
Checks if this feature is on the reverse strand.void
Stores the child features of this feature as a serialized string in the "children" attribute.Converts this feature and its child features into a GFF3 format string.toString()
Converts this feature into a tab-delimited string representation.protected String
updateAllele
(Contig contig, ArrayList<htsjdk.samtools.util.Tuple<Integer, String>> variants, Sample sample) Updates or creates an allele associated with a given contig and sample.protected void
updateProteoform
(Contig contig, String alleleUid) Updates or creates a proteoform associated with a given allele and contig.Methods inherited from class datastructure.Attributable
addAttributeIfAbsent, addAttributesIfAbsent, attributesAsString, attributesAsString, clearAttributes, extendAttribute, extendAttributes, getAttribute, getAttributeAsCollection, getAttributes, hasAttribute, hasAttributes, removeAttribute, setAttribute, setAttributes
-
Field Details
-
type
The type of this genomic feature.This field specifies the type of the feature, such as "gene", "exon", or "CDS". The type is defined according to the GFF3 specification and provides information about the biological or functional classification of the feature.
For more details, refer to the GFF3 specification: https://gmod.org/wiki/GFF3.
-
name
The name or internal identifier of this genomic feature.This field uniquely identifies the feature within the context of the analysis. It is a final field, meaning its value is immutable once assigned during the construction of the
Feature
instance. -
contig
The location of the feature on the reference, i.e., the contig/chromosome/plasmid. -
start
public final int start1-based indexed starting position of the feature. -
end
public final int end1-based indexed end position of the feature. -
strand
public final char strandStrand of the feature. -
_uid
Unique identifier of this feature. -
alleles
Alleles (SequenceType
instances) associated with this feature.This map stores alleles that are associated with the feature. Alleles represent specific sequence variations of the feature. The keys in the map are unique identifiers for the alleles, and the values are the corresponding
Feature.Allele
instances.Alleles are used to track and manage sequence variations resulting from genomic changes. Each allele is linked to its unique identifier and contains information about its sequence and attributes.
-
proteoforms
Proteoforms (SequenceType
instances) associated with this feature.This map stores proteoforms that are associated with the feature. Proteoforms represent specific sequence variants of proteins derived from the feature. The keys in the map are unique identifiers for the proteoforms, and the values are the corresponding
Feature.Proteoform
instances.Proteoforms are only relevant for coding features and are used to track and manage protein sequence variations resulting from genomic changes.
-
-
Constructor Details
-
Feature
protected Feature(String name, String contig, Number start, Number end, char strand, String type, String uid) Constructs a newFeature
instance with the specified properties.This constructor initializes a genomic feature with its name, location, strand orientation, type, and unique identifier. The feature's start and end positions are converted to integers to ensure proper indexing. The
Attributable
superclass is also initialized.- Parameters:
name
- The name of the feature, used as its internal identifier.contig
- The name of the reference location (e.g., contig, chromosome, plasmid) where the feature is located.start
- The 1-based indexed starting position of the feature on the reference.end
- The 1-based indexed end position of the feature on the reference.strand
- The strand orientation of the feature ('+' for forward strand, '-' for reverse strand).type
- The type of the feature (e.g., coding, non-coding).uid
- The unique identifier of the feature.
-
-
Method Details
-
updateAllele
protected String updateAllele(Contig contig, ArrayList<htsjdk.samtools.util.Tuple<Integer, String>> variants, Sample sample) Updates or creates an allele associated with a given contig and sample.This method checks if the specified allele is already associated with the feature. If it is, the existing allele is retrieved. Otherwise, a new allele is created based on the provided variants and contig. The method validates the input parameters, adds the sample occurrence to the allele, and updates the contig with the sequence type occurrences for each variant.
- Parameters:
contig
- TheContig
object containing the reference sequence.variants
- A list ofTuple
objects representing the variants associated with the allele. Each tuple contains:- The position of the variant.
- The alternate allele sequence.
sample
- TheSample
object representing the sample associated with this feature.- Returns:
- The unique identifier (UID) of the updated or created allele.
-
getAllele
Retrieves an allele associated with this feature by its unique identifier.This method searches for an
Feature.Allele
in the internal map of alleles using the provided unique identifier (UID). If the UID is not found, the method returnsnull
.- Parameters:
uid
- The unique identifier of the allele to retrieve.- Returns:
- The
Feature.Allele
object associated with the given UID, ornull
if not found.
-
getAlleles
Retrieves all alleles associated with this feature.This method returns a collection of
Feature.Allele
objects that are associated with this feature. The alleles are stored as values in the internal map of alleles.- Returns:
- A
Collection
ofFeature.Allele
objects associated with this feature.
-
updateProteoform
protected void updateProteoform(Contig contig, String alleleUid) throws IOException, MusialException Updates or creates a proteoform associated with a given allele and contig.This method checks if the specified allele is already associated with a proteoform. If it is, the existing proteoform is retrieved. Otherwise, a new proteoform is created based on the allele's variants and the contig's sequence. The method validates the input parameters, computes the proteoform sequence, and associates the allele with the proteoform.
- Parameters:
contig
- TheContig
object containing the reference sequence.alleleUid
- The unique identifier of the allele to update or associate with a proteoform.- Throws:
IOException
- If an I/O error occurs during sequence operations.MusialException
- If an error occurs during sequence alignment or translation.IllegalArgumentException
- If the allele does not exist, the contig does not match the feature's contig, the contig lacks a sequence, the feature is not coding, or the variants map is empty.
-
getProteoform
Retrieves a proteoform associated with this feature by its unique identifier.This method searches for a
Feature.Proteoform
in the internal map of proteoforms using the provided unique identifier (UID). If the UID is not found, the method returnsnull
.- Parameters:
uid
- The unique identifier of the proteoform to retrieve.- Returns:
- The
Feature.Proteoform
object associated with the given UID, ornull
if not found.
-
getProteoforms
Retrieves all proteoforms associated with this feature.This method returns a collection of
Feature.Proteoform
objects that are associated with this feature. The proteoforms are stored as values in the internal map of alleles.- Returns:
- A
Collection
ofFeature.Proteoform
objects associated with this feature.
-
getChildren
Retrieves a sorted map of child features associated with this feature.This method parses the "children" attribute of the feature, if present, and constructs a sorted map of child features. Each child feature is represented by a key (child type) and a list of tuples, where each tuple contains the start and end positions of the child feature. The map is sorted using a custom comparator based on the order defined in
Storage.SO
. If a child type is not found inStorage.SO
, it is assigned the maximum possible value.- Returns:
- A
SortedMap
where the keys are child feature types (e.g., "CDS"), and the values are lists ofTuple
objects representing the start and end positions of the child features.
-
setChildren
public void setChildren(SortedMap<String, List<htsjdk.samtools.util.Tuple<Integer, Integer>>> children) Stores the child features of this feature as a serialized string in the "children" attribute.This method serializes the child features into a string format where each child is represented as "type:start:end". Multiple child features are separated by commas. The serialized string is then stored in the "children" attribute of this feature.
- Parameters:
children
- ASortedMap
where the keys are child feature types (e.g., "CDS"), and the values are lists ofTuple
objects representing the start and end positions of the child features.
-
isCoding
public boolean isCoding()Determines if this feature is a coding feature.This method checks whether the feature has child elements of type "CDS" (coding sequence) and ensures that the list of such child elements is not empty. A feature is considered coding if it contains at least one "CDS" child.
- Returns:
true
if the feature is a coding feature,false
otherwise.
-
isReverse
public boolean isReverse()Checks if this feature is on the reverse strand.This method determines whether the strand orientation of the feature is reverse by checking if the strand character is
'-'
.- Returns:
true
if this feature is on the reverse strand,false
otherwise.
-
toString
Converts this feature into a tab-delimited string representation.This method generates a string containing the contig, type, name, start position, end position, strand, and attributes of the feature. Certain attributes, such as "children", are excluded from the string representation.
-
toGffString
Converts this feature and its child features into a GFF3 format string.This method generates a GFF3 representation of the feature, including its attributes and child features. The main feature is represented with its type, location, strand, and attributes. Child features are appended with their respective types, locations, and parent-child relationships.
- Returns:
- A
String
containing the GFF3 representation of this feature and its children.
-