Class SequenceOperations
This class provides static methods for sequence alignment, variant integration, sequence translation, and other related operations. It includes methods for handling nucleotide and protein sequences, as well as utilities for working with gaps and variants.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic enum
Enum to store different modes to handle prefix gaps for global sequence alignment. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiongetCanonicalVariants
(String reference, String alternative) Transforms two sequences into canonical VCF variants.globalNucleotideSequenceAlignment
(String sequenceA, String sequenceB, int gapOpenPenalty, int gapExtendPenalty, SequenceOperations.MarginalGaps left, SequenceOperations.MarginalGaps right, Integer bandWidth) Performs global nucleotide sequence alignment using a simple scoring matrix.globalProteinSequenceAlignment
(String sequenceA, String sequenceB, int gapOpenPenalty, int gapExtendPenalty, SequenceOperations.MarginalGaps left, SequenceOperations.MarginalGaps right, Integer bandWidth) Performs global protein sequence alignment using the BLOSUM80 scoring matrix.static String
integrateVariants
(Contig contig, Feature feature, NavigableMap<Integer, String> variants, boolean stripGaps) Integrates variants into a reference sequence for a given feature.static String
Pads a string with gap characters to reach a specified length.static String
Removes all gap characters from the input string.static String
translateSequence
(String sequence, boolean reverse) Translates a DNA sequence into an amino-acid sequence.
-
Constructor Details
-
SequenceOperations
public SequenceOperations()
-
-
Method Details
-
globalNucleotideSequenceAlignment
public static htsjdk.samtools.util.Tuple<String,String> globalNucleotideSequenceAlignment(String sequenceA, String sequenceB, int gapOpenPenalty, int gapExtendPenalty, SequenceOperations.MarginalGaps left, SequenceOperations.MarginalGaps right, Integer bandWidth) Performs global nucleotide sequence alignment using a simple scoring matrix.This method aligns two nucleotide sequences using a gap-affine Needleman-Wunsch algorithm. It utilizes a predefined scoring matrix for nucleotide matches, mismatches, and gaps.
The scoring matrix is defined as follows:
- Match: +1
- Mismatch: -1
- Gap: -1
- Parameters:
sequenceA
- The first nucleotide sequence to align.sequenceB
- The second nucleotide sequence to align.gapOpenPenalty
- The penalty for opening a gap in the alignment.gapExtendPenalty
- The penalty for extending an existing gap in the alignment.left
- Specifies how to handle left-marginal gaps (FREE, PENALIZE, FORBID).right
- Specifies how to handle right-marginal gaps (FREE, PENALIZE, FORBID).bandWidth
- The band-width for banded alignment, or null for non-banded alignment.- Returns:
- A
Tuple
containing the aligned sequences.
-
globalProteinSequenceAlignment
public static htsjdk.samtools.util.Tuple<String,String> globalProteinSequenceAlignment(String sequenceA, String sequenceB, int gapOpenPenalty, int gapExtendPenalty, SequenceOperations.MarginalGaps left, SequenceOperations.MarginalGaps right, Integer bandWidth) Performs global protein sequence alignment using the BLOSUM80 scoring matrix.This method aligns two protein sequences using a gap-affine Needleman-Wunsch algorithm. It utilizes the BLOSUM80 scoring matrix for amino acid matches, mismatches, and gaps.
The scoring matrix is defined as follows:
- Match: Based on BLOSUM80 values.
- Mismatch: Based on BLOSUM80 values.
- Gap penalties: Defined by the gap open and gap extend penalties.
- Parameters:
sequenceA
- The first protein sequence to align.sequenceB
- The second protein sequence to align.gapOpenPenalty
- The penalty for opening a gap in the alignment.gapExtendPenalty
- The penalty for extending an existing gap in the alignment.left
- Specifies how to handle left-marginal gaps (FREE, PENALIZE, FORBID).right
- Specifies how to handle right-marginal gaps (FREE, PENALIZE, FORBID).bandWidth
- The band-width for banded alignment, or null for non-banded alignment.- Returns:
- A
Tuple
containing the aligned sequences.
-
padGaps
Pads a string with gap characters to reach a specified length.This method appends gap characters (defined by
Constants.gapString
) to the input string until it reaches the desired length. If the input string is already equal to or longer than the specified length, no padding is added.- Parameters:
s
- The input string to be padded.length
- The desired length of the resulting string.- Returns:
- The padded string, or the original string if no padding is needed.
-
stripGaps
Removes all gap characters from the input string.This method replaces all occurrences of the gap character (defined by
Constants.gapString
) in the input string with an empty string (defined byConstants.EMPTY
).- Parameters:
s
- The input string from which gaps should be removed.- Returns:
- A new string with all gap characters removed.
-
translateSequence
Translates a DNA sequence into an amino-acid sequence. The translation is always performed in the 1-frame. Utilizes the BioJava library for translation.- Parameters:
sequence
- The DNA sequence to translate.reverse
- Whether to translate the reverse complement of the sequence.- Returns:
- The translated amino-acid sequence.
- Throws:
MusialException
- If an error occurs during translation.
-
getCanonicalVariants
public static ArrayList<org.apache.commons.lang3.tuple.Triple<Integer,String, getCanonicalVariantsString>> (String reference, String alternative) Transforms two sequences into canonical VCF variants.The specified reference and alternative are expected to be aligned sequences. Variants are formatted as triples of relative position, reference-, and variant content. The relative position is the 0-based position of the variant in the reference sequence without gaps.
-