Package datastructure

Class Feature


public class Feature extends Attributable
Representation of a genomic feature that is subject to analysis.

This class models a genomic feature, such as a gene, exon, or coding sequence (CDS), that is analyzed in the context of genomic data processing. It extends the Attributable class to inherit functionality for managing attributes associated with the feature.

Each instance of this class is uniquely identified by its name and contains information about its type, location on the reference genome, and other relevant properties.

  • Field Details

    • type

      public final String type
      The type of this genomic feature.

      This field specifies the type of the feature, such as "gene", "exon", or "CDS". The type is defined according to the GFF3 specification and provides information about the biological or functional classification of the feature.

      For more details, refer to the GFF3 specification: https://gmod.org/wiki/GFF3.

    • name

      public final String name
      The name or internal identifier of this genomic feature.

      This field uniquely identifies the feature within the context of the analysis. It is a final field, meaning its value is immutable once assigned during the construction of the Feature instance.

    • contig

      public final String contig
      The location of the feature on the reference, i.e., the contig/chromosome/plasmid.
    • start

      public final int start
      1-based indexed starting position of the feature.
    • end

      public final int end
      1-based indexed end position of the feature.
    • strand

      public final char strand
      Strand of the feature.
    • _uid

      public final String _uid
      Unique identifier of this feature.
    • alleles

      protected final HashMap<String,Feature.Allele> alleles
      Alleles (SequenceType instances) associated with this feature.

      This map stores alleles that are associated with the feature. Alleles represent specific sequence variations of the feature. The keys in the map are unique identifiers for the alleles, and the values are the corresponding Feature.Allele instances.

      Alleles are used to track and manage sequence variations resulting from genomic changes. Each allele is linked to its unique identifier and contains information about its sequence and attributes.

    • proteoforms

      protected final HashMap<String,Feature.Proteoform> proteoforms
      Proteoforms (SequenceType instances) associated with this feature.

      This map stores proteoforms that are associated with the feature. Proteoforms represent specific sequence variants of proteins derived from the feature. The keys in the map are unique identifiers for the proteoforms, and the values are the corresponding Feature.Proteoform instances.

      Proteoforms are only relevant for coding features and are used to track and manage protein sequence variations resulting from genomic changes.

  • Constructor Details

    • Feature

      protected Feature(String name, String contig, Number start, Number end, char strand, String type, String uid)
      Constructs a new Feature instance with the specified properties.

      This constructor initializes a genomic feature with its name, location, strand orientation, type, and unique identifier. The feature's start and end positions are converted to integers to ensure proper indexing. The Attributable superclass is also initialized.

      Parameters:
      name - The name of the feature, used as its internal identifier.
      contig - The name of the reference location (e.g., contig, chromosome, plasmid) where the feature is located.
      start - The 1-based indexed starting position of the feature on the reference.
      end - The 1-based indexed end position of the feature on the reference.
      strand - The strand orientation of the feature ('+' for forward strand, '-' for reverse strand).
      type - The type of the feature (e.g., coding, non-coding).
      uid - The unique identifier of the feature.
  • Method Details

    • updateAllele

      protected String updateAllele(Contig contig, ArrayList<htsjdk.samtools.util.Tuple<Integer,String>> variants, Sample sample)
      Updates or creates an allele associated with a given contig and sample.

      This method checks if the specified allele is already associated with the feature. If it is, the existing allele is retrieved. Otherwise, a new allele is created based on the provided variants and contig. The method validates the input parameters, adds the sample occurrence to the allele, and updates the contig with the sequence type occurrences for each variant.

      Parameters:
      contig - The Contig object containing the reference sequence.
      variants - A list of Tuple objects representing the variants associated with the allele. Each tuple contains:
      • The position of the variant.
      • The alternate allele sequence.
      sample - The Sample object representing the sample associated with this feature.
      Returns:
      The unique identifier (UID) of the updated or created allele.
    • getAllele

      public Feature.Allele getAllele(String uid)
      Retrieves an allele associated with this feature by its unique identifier.

      This method searches for an Feature.Allele in the internal map of alleles using the provided unique identifier (UID). If the UID is not found, the method returns null.

      Parameters:
      uid - The unique identifier of the allele to retrieve.
      Returns:
      The Feature.Allele object associated with the given UID, or null if not found.
    • getAlleles

      public Collection<Feature.Allele> getAlleles()
      Retrieves all alleles associated with this feature.

      This method returns a collection of Feature.Allele objects that are associated with this feature. The alleles are stored as values in the internal map of alleles.

      Returns:
      A Collection of Feature.Allele objects associated with this feature.
    • updateProteoform

      protected void updateProteoform(Contig contig, String alleleUid) throws IOException, MusialException
      Updates or creates a proteoform associated with a given allele and contig.

      This method checks if the specified allele is already associated with a proteoform. If it is, the existing proteoform is retrieved. Otherwise, a new proteoform is created based on the allele's variants and the contig's sequence. The method validates the input parameters, computes the proteoform sequence, and associates the allele with the proteoform.

      Parameters:
      contig - The Contig object containing the reference sequence.
      alleleUid - The unique identifier of the allele to update or associate with a proteoform.
      Throws:
      IOException - If an I/O error occurs during sequence operations.
      MusialException - If an error occurs during sequence alignment or translation.
      IllegalArgumentException - If the allele does not exist, the contig does not match the feature's contig, the contig lacks a sequence, the feature is not coding, or the variants map is empty.
    • getProteoform

      public Feature.Proteoform getProteoform(String uid)
      Retrieves a proteoform associated with this feature by its unique identifier.

      This method searches for a Feature.Proteoform in the internal map of proteoforms using the provided unique identifier (UID). If the UID is not found, the method returns null.

      Parameters:
      uid - The unique identifier of the proteoform to retrieve.
      Returns:
      The Feature.Proteoform object associated with the given UID, or null if not found.
    • getProteoforms

      public Collection<Feature.Proteoform> getProteoforms()
      Retrieves all proteoforms associated with this feature.

      This method returns a collection of Feature.Proteoform objects that are associated with this feature. The proteoforms are stored as values in the internal map of alleles.

      Returns:
      A Collection of Feature.Proteoform objects associated with this feature.
    • getChildren

      public SortedMap<String,List<htsjdk.samtools.util.Tuple<Integer,Integer>>> getChildren()
      Retrieves a sorted map of child features associated with this feature.

      This method parses the "children" attribute of the feature, if present, and constructs a sorted map of child features. Each child feature is represented by a key (child type) and a list of tuples, where each tuple contains the start and end positions of the child feature. The map is sorted using a custom comparator based on the order defined in Storage.SO. If a child type is not found in Storage.SO, it is assigned the maximum possible value.

      Returns:
      A SortedMap where the keys are child feature types (e.g., "CDS"), and the values are lists of Tuple objects representing the start and end positions of the child features.
    • setChildren

      public void setChildren(SortedMap<String,List<htsjdk.samtools.util.Tuple<Integer,Integer>>> children)
      Stores the child features of this feature as a serialized string in the "children" attribute.

      This method serializes the child features into a string format where each child is represented as "type:start:end". Multiple child features are separated by commas. The serialized string is then stored in the "children" attribute of this feature.

      Parameters:
      children - A SortedMap where the keys are child feature types (e.g., "CDS"), and the values are lists of Tuple objects representing the start and end positions of the child features.
    • isCoding

      public boolean isCoding()
      Determines if this feature is a coding feature.

      This method checks whether the feature has child elements of type "CDS" (coding sequence) and ensures that the list of such child elements is not empty. A feature is considered coding if it contains at least one "CDS" child.

      Returns:
      true if the feature is a coding feature, false otherwise.
    • isReverse

      public boolean isReverse()
      Checks if this feature is on the reverse strand.

      This method determines whether the strand orientation of the feature is reverse by checking if the strand character is '-'.

      Returns:
      true if this feature is on the reverse strand, false otherwise.
    • toString

      public String toString()
      Converts this feature into a tab-delimited string representation.

      This method generates a string containing the contig, type, name, start position, end position, strand, and attributes of the feature. Certain attributes, such as "children", are excluded from the string representation.

      Overrides:
      toString in class Object
      Returns:
      A String representing the feature in a tab-delimited format.
    • toGffString

      public String toGffString()
      Converts this feature and its child features into a GFF3 format string.

      This method generates a GFF3 representation of the feature, including its attributes and child features. The main feature is represented with its type, location, strand, and attributes. Child features are appended with their respective types, locations, and parent-child relationships.

      Returns:
      A String containing the GFF3 representation of this feature and its children.