Package op

Class AminoacidSequenceGenerator

java.lang.Object
op.NucleotideSequenceGenerator
op.AminoacidSequenceGenerator
All Implemented Interfaces:
SequenceGenerator

public class AminoacidSequenceGenerator extends NucleotideSequenceGenerator
The AminoacidSequenceGenerator class is responsible for generating aminoacid sequences based on genomic data contained in a Storage instance.

This class extends the NucleotideSequenceGenerator and provides functionality to generate amino acid sequences for specified samples. It integrates variants associated with alleles and proteoforms to produce the final sequences. The class ensures that the provided feature is coding and that the associated contig has a reference sequence.

  • Constructor Details

    • AminoacidSequenceGenerator

      public AminoacidSequenceGenerator(Storage storage, Contig contig, Feature feature, boolean conserved, boolean aligned, Set<String> sampleIdentifiers) throws IOException, MusialException
      Constructs a new instance of the AminoacidSequenceGenerator class.

      This constructor initializes the generator for amino acid sequence generation by calling the superclass constructor. It passes the provided storage, contig, feature, conserved flag, aligned flag, and sample identifiers to the superclass for initialization.

      In contrast to the NucleotideSequenceGenerator, this class always defines a Feature and requires an associated parent contig with a reference sequence. The feature must be a coding feature to ensure that amino acid sequences can be generated. In addition, the NucleotideSequenceGenerator.interval is always defined as the full length of the amino acid sequence derived from the feature's nucleotide sequence.

      Parameters:
      storage - The storage object containing genomic data.
      contig - The contig associated with the sequence generation.
      feature - The feature associated with the sequence generation.
      conserved - A flag indicating whether the sequence generation includes conserved sites.
      aligned - A flag indicating whether the generated sequences are aligned.
      sampleIdentifiers - Optional sample identifiers to restrict the sequence generation.
      Throws:
      IOException - If an error occurs during sequence retrieval.
      MusialException - If an error occurs during initialization.
  • Method Details

    • getSequence

      public String getSequence(String sampleIdentifier) throws MusialException
      Retrieves the amino acid sequence for a given sample identifier.

      This method generates the sequence for the specified sample by integrating variants associated with the sample's allele and proteoform. If the sequence for the proteoform is already cached, it is returned directly. Otherwise, the sequence is generated, cached, and returned.

      Specified by:
      getSequence in interface SequenceGenerator
      Overrides:
      getSequence in class NucleotideSequenceGenerator
      Parameters:
      sampleIdentifier - The identifier of the sample for which the sequence is generated.
      Returns:
      A String representing the nucleotide sequence for the given sample.
      Throws:
      MusialException - If the sample identifier is invalid or an error occurs during sequence generation.
    • validateContig

      protected void validateContig(Storage storage, Contig contig, boolean conserved)
      Validates the provided contig to ensure it is suitable for amino acid sequence generation.

      This method first calls the superclass implementation to perform general contig validation. It then performs additional validation specific to amino acid sequence generation, ensuring that the contig has an associated reference sequence.

      Parameters:
      storage - The storage object containing genomic data.
      contig - The contig to validate.
      conserved - A flag indicating whether the sequence generation includes conserved sites.
      Throws:
      IllegalArgumentException - If the contig fails the superclass validation or does not have a reference sequence.
    • validateFeature

      protected void validateFeature(Storage storage, Feature feature)
      Validates the provided feature to ensure it is suitable for amino acid sequence generation.

      This method first calls the superclass implementation to perform general feature validation. It then performs additional validation specific to amino acid sequence generation, ensuring that the feature is a coding feature.

      Parameters:
      storage - The storage object containing genomic data.
      feature - The feature to validate.
      Throws:
      IllegalArgumentException - If the feature is not coding or fails the superclass validation.
    • generateContext

      protected void generateContext() throws IOException, MusialException
      Generates the amino acid context for the sequence generator.

      This method initializes the `context` map by processing variants related to the alleleIdentifiers inferred from the NucleotideSequenceGenerator.sampleIdentifiers and the associated NucleotideSequenceGenerator.feature. It collects variants from the proteoforms of the feature, filters them based on their relation to the allele identifiers, and processes each variant to update the context map. The context map is implemented using a BTreeMap for efficient storage and retrieval.

      Throws:
      IOException - If an error occurs during initialization or sequence retrieval.
      MusialException - If an error occurs during initialization or variant processing.
    • generateConservedContext

      protected void generateConservedContext() throws IOException, MusialException
      Generates the conserved nucleotide context for the sequence generator.

      This method initializes the `context` map by processing the reference sequence and calculating the maximal insertion lengths for positions related to the allele identifiers. The context map is populated with reference bases and their corresponding insertion lengths.

      Throws:
      IOException - If an error occurs during initialization or sequence retrieval.
      MusialException - If an error occurs during initialization or variant processing.