Class AminoacidSequenceGenerator
- All Implemented Interfaces:
SequenceGenerator
AminoacidSequenceGenerator
class is responsible for generating aminoacid sequences based on genomic data contained in a
Storage
instance.
This class extends the NucleotideSequenceGenerator
and provides functionality to generate amino acid sequences for specified
samples. It integrates variants associated with alleles and proteoforms to produce the final sequences. The class ensures that the
provided feature is coding and that the associated contig has a reference sequence.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprotected void
Generates the conserved nucleotide context for the sequence generator.protected void
Generates the amino acid context for the sequence generator.getSequence
(String sampleIdentifier) Retrieves the amino acid sequence for a given sample identifier.protected void
validateContig
(Storage storage, Contig contig, boolean conserved) Validates the provided contig to ensure it is suitable for amino acid sequence generation.protected void
validateFeature
(Storage storage, Feature feature) Validates the provided feature to ensure it is suitable for amino acid sequence generation.Methods inherited from class op.NucleotideSequenceGenerator
getName
-
Constructor Details
-
AminoacidSequenceGenerator
public AminoacidSequenceGenerator(Storage storage, Contig contig, Feature feature, boolean conserved, boolean aligned, Set<String> sampleIdentifiers) throws IOException, MusialException Constructs a new instance of the AminoacidSequenceGenerator class.This constructor initializes the generator for amino acid sequence generation by calling the superclass constructor. It passes the provided storage, contig, feature, conserved flag, aligned flag, and sample identifiers to the superclass for initialization.
In contrast to the
NucleotideSequenceGenerator
, this class always defines aFeature
and requires an associated parent contig with a reference sequence. The feature must be a coding feature to ensure that amino acid sequences can be generated. In addition, theNucleotideSequenceGenerator.interval
is always defined as the full length of the amino acid sequence derived from the feature's nucleotide sequence.- Parameters:
storage
- The storage object containing genomic data.contig
- The contig associated with the sequence generation.feature
- The feature associated with the sequence generation.conserved
- A flag indicating whether the sequence generation includes conserved sites.aligned
- A flag indicating whether the generated sequences are aligned.sampleIdentifiers
- Optional sample identifiers to restrict the sequence generation.- Throws:
IOException
- If an error occurs during sequence retrieval.MusialException
- If an error occurs during initialization.
-
-
Method Details
-
getSequence
Retrieves the amino acid sequence for a given sample identifier.This method generates the sequence for the specified sample by integrating variants associated with the sample's allele and proteoform. If the sequence for the proteoform is already cached, it is returned directly. Otherwise, the sequence is generated, cached, and returned.
- Specified by:
getSequence
in interfaceSequenceGenerator
- Overrides:
getSequence
in classNucleotideSequenceGenerator
- Parameters:
sampleIdentifier
- The identifier of the sample for which the sequence is generated.- Returns:
- A
String
representing the nucleotide sequence for the given sample. - Throws:
MusialException
- If the sample identifier is invalid or an error occurs during sequence generation.
-
validateContig
Validates the provided contig to ensure it is suitable for amino acid sequence generation.This method first calls the superclass implementation to perform general contig validation. It then performs additional validation specific to amino acid sequence generation, ensuring that the contig has an associated reference sequence.
- Parameters:
storage
- The storage object containing genomic data.contig
- The contig to validate.conserved
- A flag indicating whether the sequence generation includes conserved sites.- Throws:
IllegalArgumentException
- If the contig fails the superclass validation or does not have a reference sequence.
-
validateFeature
Validates the provided feature to ensure it is suitable for amino acid sequence generation.This method first calls the superclass implementation to perform general feature validation. It then performs additional validation specific to amino acid sequence generation, ensuring that the feature is a coding feature.
- Parameters:
storage
- The storage object containing genomic data.feature
- The feature to validate.- Throws:
IllegalArgumentException
- If the feature is not coding or fails the superclass validation.
-
generateContext
Generates the amino acid context for the sequence generator.This method initializes the `context` map by processing variants related to the
alleleIdentifiers
inferred from theNucleotideSequenceGenerator.sampleIdentifiers
and the associatedNucleotideSequenceGenerator.feature
. It collects variants from the proteoforms of the feature, filters them based on their relation to the allele identifiers, and processes each variant to update the context map. The context map is implemented using a BTreeMap for efficient storage and retrieval.- Throws:
IOException
- If an error occurs during initialization or sequence retrieval.MusialException
- If an error occurs during initialization or variant processing.
-
generateConservedContext
Generates the conserved nucleotide context for the sequence generator.This method initializes the `context` map by processing the reference sequence and calculating the maximal insertion lengths for positions related to the allele identifiers. The context map is populated with reference bases and their corresponding insertion lengths.
- Throws:
IOException
- If an error occurs during initialization or sequence retrieval.MusialException
- If an error occurs during initialization or variant processing.
-