Class AminoacidSequenceGenerator
- All Implemented Interfaces:
SequenceGenerator
AminoacidSequenceGenerator class is responsible for generating aminoacid sequences based on genomic data contained in a
Storage instance.
This class extends the NucleotideSequenceGenerator and provides functionality to generate amino acid sequences for specified
samples. It integrates variants associated with alleles and proteoforms to produce the final sequences. The class ensures that the
provided feature is coding and that the associated contig has a reference sequence.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected voidGenerates the conserved nucleotide context for the sequence generator.protected voidGenerates the amino acid context for the sequence generator.getSequence(String sampleIdentifier) Retrieves the amino acid sequence for a given sample identifier.protected voidvalidateContig(Storage storage, Contig contig, boolean conserved) Validates the provided contig to ensure it is suitable for amino acid sequence generation.protected voidvalidateFeature(Storage storage, Feature feature) Validates the provided feature to ensure it is suitable for amino acid sequence generation.Methods inherited from class op.NucleotideSequenceGenerator
getName
-
Constructor Details
-
AminoacidSequenceGenerator
public AminoacidSequenceGenerator(Storage storage, Contig contig, Feature feature, boolean conserved, boolean aligned, Set<String> sampleIdentifiers) throws IOException, MusialException Constructs a new instance of the AminoacidSequenceGenerator class.This constructor initializes the generator for amino acid sequence generation by calling the superclass constructor. It passes the provided storage, contig, feature, conserved flag, aligned flag, and sample identifiers to the superclass for initialization.
In contrast to the
NucleotideSequenceGenerator, this class always defines aFeatureand requires an associated parent contig with a reference sequence. The feature must be a coding feature to ensure that amino acid sequences can be generated. In addition, theNucleotideSequenceGenerator.intervalis always defined as the full length of the amino acid sequence derived from the feature's nucleotide sequence.- Parameters:
storage- The storage object containing genomic data.contig- The contig associated with the sequence generation.feature- The feature associated with the sequence generation.conserved- A flag indicating whether the sequence generation includes conserved sites.aligned- A flag indicating whether the generated sequences are aligned.sampleIdentifiers- Optional sample identifiers to restrict the sequence generation.- Throws:
IOException- If an error occurs during sequence retrieval.MusialException- If an error occurs during initialization.
-
-
Method Details
-
getSequence
Retrieves the amino acid sequence for a given sample identifier.This method generates the sequence for the specified sample by integrating variants associated with the sample's allele and proteoform. If the sequence for the proteoform is already cached, it is returned directly. Otherwise, the sequence is generated, cached, and returned.
- Specified by:
getSequencein interfaceSequenceGenerator- Overrides:
getSequencein classNucleotideSequenceGenerator- Parameters:
sampleIdentifier- The identifier of the sample for which the sequence is generated.- Returns:
- A
Stringrepresenting the nucleotide sequence for the given sample. - Throws:
MusialException- If the sample identifier is invalid or an error occurs during sequence generation.
-
validateContig
Validates the provided contig to ensure it is suitable for amino acid sequence generation.This method first calls the superclass implementation to perform general contig validation. It then performs additional validation specific to amino acid sequence generation, ensuring that the contig has an associated reference sequence.
- Parameters:
storage- The storage object containing genomic data.contig- The contig to validate.conserved- A flag indicating whether the sequence generation includes conserved sites.- Throws:
IllegalArgumentException- If the contig fails the superclass validation or does not have a reference sequence.
-
validateFeature
Validates the provided feature to ensure it is suitable for amino acid sequence generation.This method first calls the superclass implementation to perform general feature validation. It then performs additional validation specific to amino acid sequence generation, ensuring that the feature is a coding feature.
- Parameters:
storage- The storage object containing genomic data.feature- The feature to validate.- Throws:
IllegalArgumentException- If the feature is not coding or fails the superclass validation.
-
generateContext
Generates the amino acid context for the sequence generator.This method initializes the `context` map by processing variants related to the
alleleIdentifiersinferred from theNucleotideSequenceGenerator.sampleIdentifiersand the associatedNucleotideSequenceGenerator.feature. It collects variants from the proteoforms of the feature, filters them based on their relation to the allele identifiers, and processes each variant to update the context map. The context map is implemented using a BTreeMap for efficient storage and retrieval.- Throws:
IOException- If an error occurs during initialization or sequence retrieval.MusialException- If an error occurs during initialization or variant processing.
-
generateConservedContext
Generates the conserved nucleotide context for the sequence generator.This method initializes the `context` map by processing the reference sequence and calculating the maximal insertion lengths for positions related to the allele identifiers. The context map is populated with reference bases and their corresponding insertion lengths.
- Throws:
IOException- If an error occurs during initialization or sequence retrieval.MusialException- If an error occurs during initialization or variant processing.
-