Class SequenceType
- Direct Known Subclasses:
Feature.Allele
,Feature.Proteoform
This class extends Attributable
to manage metadata.
This class is extended by the Feature.Allele
and Feature.Proteoform
classes.
-
Field Summary
FieldsModifier and TypeFieldDescriptionfinal String
The unique identifier of this entity.protected String
Optional name to describe this sequence type.A set representing the occurrences of this sequence type.protected final NavigableMap
<Integer, String> Variants defining this entity. -
Constructor Summary
ConstructorsConstructorDescriptionSequenceType
(String uid, List<htsjdk.samtools.util.Tuple<Integer, String>> variants) Constructs a newSequenceType
instance with the specified unique identifier and variants. -
Method Summary
Modifier and TypeMethodDescriptionvoid
addOccurrence
(String identifier) Adds an occurrence to this sequence type.static int
computeLengthVariation
(List<htsjdk.samtools.util.Tuple<Integer, String>> variants) Computes the net shift in sequence length caused by variants.getFastaHeader
(String featureName, String sequenceIdentifier) Generates a FASTA header for the sequence type.getName()
Retrieves the name of this sequence type.Retrieves the entities (by their unique identifiers) associated with this sequence type.getVariant
(int position) Retrieves the variant at the specified position associated with this sequence type.Retrieves the variants associated with this sequence type.boolean
hasOccurrence
(String identifier) Checks if this sequence type is associated with an entity by itsidentifier
.boolean
hasVariant
(int position) Checks if this sequence type has a variant at the specified position.boolean
hasVariant
(int position, String content) Checks if this sequence type has a specific variant at the specified position.Converts the occurrences of this sequence type to a comma-separated string.protected void
Sets the name of this sequence type.toString()
Returns a string representation of this sequence type in the formatidentifier attributes variants
.Converts the variants of this sequence type to a string representation.static String
variantsAsString
(List<htsjdk.samtools.util.Tuple<Integer, String>> variants) Converts a list of variants to a string representation.static String
variantsAsString
(Map<Integer, String> variants) Converts a map of variants to a string representation.Methods inherited from class datastructure.Attributable
addAttributeIfAbsent, addAttributesIfAbsent, attributesAsString, attributesAsString, clearAttributes, extendAttribute, extendAttributes, getAttribute, getAttributeAsCollection, getAttributes, hasAttribute, hasAttributes, removeAttribute, setAttribute, setAttributes
-
Field Details
-
name
Optional name to describe this sequence type.This field stores a human-readable name for the sequence type. It is optional and can be set to provide additional context or description for the sequence type. If not set, the sequence type is identified solely by its unique identifier (_uid).
-
_uid
The unique identifier of this entity.This field serves as a final and immutable unique identifier for the sequence type. It is assigned during the construction of the
SequenceType
instance and cannot be modified afterward. The identifier is used to uniquely distinguish this entity from other sequence types. -
variants
Variants defining this entity.This field stores a map of variants associated with this sequence type. The map is ordered and navigable, with the keys representing the positions of the variants and the values representing the alternate alleles. All variants must be represented in their canonical form.
-
occurrence
A set representing the occurrences of this sequence type.This field stores unique identifiers of entities where this sequence type occurs. It is used to track and manage the presence of this sequence type across different contexts.
-
-
Constructor Details
-
SequenceType
Constructs a newSequenceType
instance with the specified unique identifier and variants.This constructor initializes a
SequenceType
object with a unique identifier and a list of variants. The variants are provided as a list ofTuple
objects, where each tuple contains:- The position of the variant (field
a
of the tuple). - The alternate allele of the variant (field
b
of the tuple).
The constructor populates the
variants
field, which is aTreeMap
, by iterating through the provided list of tuples. The positions and alternate alleles are extracted from each tuple and added to the map, ensuring that the variants are stored in a sorted order based on their positions.- Parameters:
uid
- The unique identifier for this sequence type.variants
- A list ofTuple
objects representing the variants associated with this sequence type.
- The position of the variant (field
-
-
Method Details
-
setName
Sets the name of this sequence type.- Parameters:
name
- The name to set for this sequence type.
-
getName
Retrieves the name of this sequence type.- Returns:
- The name of this sequence type, or
null
if it has not been set.
-
addOccurrence
Adds an occurrence to this sequence type.- Parameters:
identifier
- The unique identifier of the entity to add as an occurrence.
-
getOccurrence
Retrieves the entities (by their unique identifiers) associated with this sequence type.- Returns:
- A set of unique identifiers associated with this sequence type.
-
hasOccurrence
Checks if this sequence type is associated with an entity by itsidentifier
.- Parameters:
identifier
- Unique identifier to check for.- Returns:
true
if the entity is associated with this sequence type,false
otherwise.
-
occurrenceAsString
Converts the occurrences of this sequence type to a comma-separated string.This method joins all unique identifiers stored in the
occurrence
set into a single string, separated by commas. It uses the delimiter defined inConstants.COMMA
.- Returns:
- A
String
representation of the occurrences, separated by commas.
-
getVariant
Retrieves the variant at the specified position associated with this sequence type.This method looks up the variant at the given position in the
variants
map. If a variant exists at the specified position, it returns the corresponding alternate allele. If no variant is found, it returnsnull
.- Parameters:
position
- The position to retrieve the variant for.- Returns:
- The alternate allele at the specified position, or
null
if no variant is present.
-
getVariants
Retrieves the variants associated with this sequence type.This method returns the map of variants that define this sequence type. The map is navigable, with the keys representing the positions of the variants and the values representing the alternate base sequences. The returned map is immutable and reflects the canonical form of the variants.
- Returns:
- A
NavigableMap
of variants, where the keys are positions and the values are the alternate base sequences.
-
hasVariant
public boolean hasVariant(int position) Checks if this sequence type has a variant at the specified position.- Parameters:
position
- The position to check for a variant.- Returns:
true
if a variant exists at the specified position,false
otherwise.
-
hasVariant
Checks if this sequence type has a specific variant at the specified position.- Parameters:
position
- The position to check for a variant.content
- The content of the variant to check for.- Returns:
true
if the specified variant exists at the position,false
otherwise.
-
variantsAsString
Converts the variants of this sequence type to a string representation.This method uses
variantsAsString(Map)
to convert the variants map into a string representation in the format(POS0)(ALT0).(POS1)(ALT1)...
.- Returns:
- A
String
representation of the variants in the format(POS0)(ALT0).(POS1)(ALT1)...
.
-
toString
Returns a string representation of this sequence type in the formatidentifier attributes variants
.The string representation includes:
- The identifier, which is either the
name
(if set) or the unique identifier_uid
. - The attributes of this sequence type, formatted using
Attributable.attributesAsString()
. - The variants associated with this sequence type, formatted using
variantsAsString()
.
- The identifier, which is either the
-
variantsAsString
Converts a map of variants to a string representation.This method takes a map of variants, where the keys are positions and the values are alternate alleles. It converts the map into a string representation in the format
(POS0)(ALT0).(POS1)(ALT1)...
.- Parameters:
variants
- A map of variants, where the keys are positions and the values are alternate alleles.- Returns:
- A
String
representation of the variants in the format(POS0)(ALT0).(POS1)(ALT1)...
..
-
variantsAsString
Converts a list of variants to a string representation.This method takes a list of
Tuple
objects, where each tuple contains a position and an alternate allele. It converts the list into a string representation in the format(POS0)(ALT0).(POS1)(ALT1)...
.- Parameters:
variants
- A list ofTuple
objects representing the variants.- Returns:
- A
String
representation of the variants in the format(POS0)(ALT0).(POS1)(ALT1)...
.
-
computeLengthVariation
Computes the net shift in sequence length caused by variants.This method calculates the cumulative effect of insertions and deletions on the sequence length. Each variant is analyzed to determine whether it represents an insertion or a deletion:
- If the variant is an insertion, its length (number of bases minus one) is added to the net shift.
- If the variant is a deletion, its length (number of bases minus one) is subtracted from the net shift.
- Other types of variants do not affect the net shift.
- Parameters:
variants
- A list ofTuple
objects, where each tuple contains:a
: The position of the variant (not used in this method).b
: The alternate allele of the variant.
- Returns:
- The net shift in sequence length as an
int
.
-
getFastaHeader
Generates a FASTA header for the sequence type.This method constructs a FASTA header string for the sequence type using the provided feature name and sequence identifier. The header includes an identifier in the format
lcl|<featureName>_<sequenceIdentifier>
. Additionally, it appends optional properties to the header if they are present as attributes:allelic_frequency
: The allelic frequency of the sequence type.so_effects
: Sequence ontology effects associated with the sequence type.
- Parameters:
featureName
- The name of the feature to include in the FASTA header.sequenceIdentifier
- The identifier of the sequence to include in the FASTA header.- Returns:
- A
String
representing the FASTA header, including the identifier and optional properties.
-