Package model

Class Feature


public class Feature extends Attributes
Representation of a genomic feature.

This class models a genomic feature, such as a gene, exon, or coding sequence (CDS), that is analyzed in the context of genomic data processing. It extends the Attributes class to inherit functionality for managing attributes associated with the feature. Each instance contains information about its type and location on the reference genome - i.e., a Contig. In addition, an inner map is used to store SequenceTypes associated with the feature.

Features are stored in the Storage.features property of the model.

  • Field Details

    • _id

      public final String _id
      Unique identifier of this feature.

      This field serves as the unique identifier for the feature and is used to reference it in the model.

    • type

      public final String type
      The type of this genomic feature.

      This field specifies the type of the feature, such as "gene", "exon", or "CDS". The type is defined according to the GFF3 specification and provides information about the biological or functional classification of the feature.

      For more details, refer to the GFF3 specification: https://gmod.org/wiki/GFF3.

    • name

      public final String name
      An additional (human-readable) name for this feature.

      This should be at best a database identifier or common gene name.

    • contig

      public final String contig
      The parent genomic location of this feature.
    • start

      public final int start
      1-based starting position of the feature.
    • end

      public final int end
      1-based end position of the feature.
    • strand

      public final char strand
      Strand of the feature.
  • Constructor Details

    • Feature

      public Feature(String name, String contig, Number start, Number end, char strand, String type, String identifier)
      Constructs a new Feature instance with the specified properties.

      This constructor initializes a genomic feature with its id, location, strand orientation, type, and unique identifier. The feature's start and end positions are converted to integers to ensure proper indexing. The Attributes superclass is also initialized.

      Parameters:
      name - The id of the feature, used as its internal identifier.
      contig - The id of the reference location (e.g., contig, chromosome, plasmid) where the feature is located.
      start - The 1-based indexed starting position of the feature on the reference.
      end - The 1-based indexed end position of the feature on the reference.
      strand - The strand orientation of the feature ('+' for forward strand, '-' for reverse strand).
      type - The type of the feature (e.g., coding, non-coding).
      identifier - The unique identifier of the feature.
  • Method Details

    • isCoding

      public boolean isCoding()
      Determines if this feature is a coding feature.

      This method checks whether the feature is of type "CDS" (coding sequence) or if any of its sub-features are of type "CDS". A feature is considered coding if it directly represents a coding sequence or contains sub-features that do.

      Returns:
      true if the feature is of type "CDS" or has sub-features of type "CDS"; false otherwise.
    • isReverse

      public boolean isReverse()
      Returns if this feature is on the reverse strand.

      This method determines whether the strand orientation of the feature is reverse by checking if the strand character is '-'.

      Returns:
      true if this feature is on the reverse strand, false otherwise.
    • addSubFeature

      public void addSubFeature(String type, int start, int end) throws MusialException
      Adds a sub-feature to this genomic feature.

      This method validates and adds a sub-feature to the list of sub-features associated with this genomic feature. The sub-feature is defined by its type, start position, and end position. Validation ensures that:

      If validation fails, an IllegalArgumentException is thrown.
      Parameters:
      type - The type of the sub-feature (e.g., "exon", "CDS").
      start - The 1-based start position of the sub-feature.
      end - The 1-based end position of the sub-feature.
      Throws:
      MusialException - if the sub-feature type is unrecognized or if its positions are out of bounds.
    • getSubFeatures

      public List<Feature.SubFeature> getSubFeatures()
      Retrieves all sub-features associated with this genomic feature.

      This method provides an unmodifiable view of the list of sub-features associated with this genomic feature. Sub-features represent smaller components of the feature, such as exons or coding sequences (CDS), and include their type and genomic location (start and end positions).

      The unmodifiable list ensures that the original list cannot be modified externally, preserving data integrity.

      Returns:
      An unmodifiable List of Feature.SubFeature objects representing the sub-features of this genomic feature.
    • clearSubFeatures

      public void clearSubFeatures()
      Clears all sub-features associated with this genomic feature.

      This method removes all sub-features from the list of sub-features associated with this genomic feature. After calling this method, the list of sub-features will be empty.

    • clearSubFeatures

      public void clearSubFeatures(int level)
      Clears all sub-features of a specific Sequence Ontology (SO) hierarchy level associated with this genomic feature.

      This method removes all sub-features from the list of sub-features that match the specified SO hierarchy level. The hierarchy level is determined using the Storage.SEQUENCE_ONTOLOGY_HIERARCHY map.

      After calling this method, only sub-features that do not match the specified level will remain in the list.

      Parameters:
      level - The SO hierarchy level of the sub-features to remove (e.g., 0 for "region", 1 for "gene").
    • hasAllele

      public boolean hasAllele(String alleleIdentifier)
      Checks if an allele with the specified identifier exists in this feature.
      Parameters:
      alleleIdentifier - The identifier of the allele to check for.
      Returns:
      true if an allele with the given identifier exists, false otherwise.
    • addAllele

      public void addAllele(Allele allele)
      Adds an allele to this feature.

      This method adds the specified Allele object to the internal map of alleles associated with this feature. The allele is stored using its unique identifier as the key.

      Note: No internal validation is performed based on the coordinates of the feature.

      Parameters:
      allele - The Allele object to be added to this feature.
    • getAllele

      public Allele getAllele(String alleleIdentifier)
      Retrieves an allele associated with this feature by its unique identifier (_id) or null.
      Parameters:
      alleleIdentifier - The identifier of the allele to retrieve.
      Returns:
      The Allele object associated with the given UID or null if not found.
    • getAlleles

      public Collection<Allele> getAlleles()
      Retrieves all alleles associated with this feature.

      This method provides an unmodifiable view of the collection of alleles associated with this feature. The alleles are stored as values in the internal map, ensuring that the original collection cannot be modified externally.

      Returns:
      An unmodifiable Collection of Allele objects associated with this feature.
    • getAlleleCount

      public int getAlleleCount()
      Retrieves the number of alleles associated with this feature.

      This method returns the size of the internal map of alleles, which represents the total number of unique alleles associated with this feature.

      Returns:
      The number of alleles associated with this feature.
    • hasProteoform

      public boolean hasProteoform(String proteoformIdentifier)
      Checks if a proteoform with the specified identifier exists in this feature.
      Parameters:
      proteoformIdentifier - The identifier of the proteoform to check for.
      Returns:
      true if a proteoform with the given identifier exists, false otherwise.
    • addProteoform

      public void addProteoform(Proteoform proteoform)
      Adds a proteoform to this feature.

      This method adds the specified Proteoform object to the internal map of proteoforms associated with this feature. The proteoform is stored using its unique identifier as the key.

      Note: No internal validation is performed based on the coordinates of the feature.

      Parameters:
      proteoform - The Proteoform object to be added to this feature.
    • getProteoform

      public Proteoform getProteoform(String proteoformIdentifier)
      Retrieves a proteoform associated with this feature by its unique identifier or null.
      Parameters:
      proteoformIdentifier - The unique identifier of the proteoform to retrieve.
      Returns:
      The Proteoform object associated with the given UID or null if not found.
    • getProteoforms

      public Collection<Proteoform> getProteoforms()
      Retrieves all proteoforms associated with this feature.

      This method provides an unmodifiable view of the collection of proteoforms associated with this feature. Proteoforms represent specific sequence variants of proteins derived from the feature. The unmodifiable collection ensures that the original data cannot be modified externally.

      Returns:
      An unmodifiable Collection of Proteoform objects associated with this feature.
    • getProteoformCount

      public int getProteoformCount()
      Retrieves the number of proteoforms associated with this feature.

      This method returns the size of the internal map of proteoforms, which represents the total number of unique proteoforms associated with this feature.

      Returns:
      The number of proteoforms associated with this feature.
    • toString

      public String toString()
      Generates a string representation of this feature.

      This method constructs a string representation of the feature using its contig, genomic coordinates, and name. The format includes the contig identifier, start and end positions, and the feature name, separated by specific constants.

      Overrides:
      toString in class Object
      Returns:
      A String representing the feature in the format: contig:g.start_end=featureName.
    • hashCode

      public int hashCode()
      Computes the hash code for this feature.

      This method calculates the hash code of the feature based on its string representation.

      Overrides:
      hashCode in class Object
      Returns:
      The hash code of this feature.
    • equals

      public boolean equals(Object obj)
      Compares this feature to another object for equality.

      This method checks if the provided object is the same instance as this feature. If not, it verifies that the object is of the same class and compares their string representations for equality.

      Overrides:
      equals in class Object
      Parameters:
      obj - The object to compare with this Feature instance.
      Returns:
      true if the objects are the same instance or if their string representations are equal; false otherwise.