Package datastructure

Class SequenceType

java.lang.Object
datastructure.Attributable
datastructure.SequenceType
Direct Known Subclasses:
Feature.Allele, Feature.Proteoform

public class SequenceType extends Attributable
Represents a sequence type with associated variants, occurrences, and attributes.

This class extends Attributable to manage metadata.

This class is extended by the Feature.Allele and Feature.Proteoform classes.

  • Field Details

    • name

      protected String name
      Optional name to describe this sequence type.

      This field stores a human-readable name for the sequence type. It is optional and can be set to provide additional context or description for the sequence type. If not set, the sequence type is identified solely by its unique identifier (_uid).

    • _uid

      public final String _uid
      The unique identifier of this entity.

      This field serves as a final and immutable unique identifier for the sequence type. It is assigned during the construction of the SequenceType instance and cannot be modified afterward. The identifier is used to uniquely distinguish this entity from other sequence types.

    • variants

      protected final NavigableMap<Integer,String> variants
      Variants defining this entity.

      This field stores a map of variants associated with this sequence type. The map is ordered and navigable, with the keys representing the positions of the variants and the values representing the alternate alleles. All variants must be represented in their canonical form.

    • occurrence

      protected final HashSet<String> occurrence
      A set representing the occurrences of this sequence type.

      This field stores unique identifiers of entities where this sequence type occurs. It is used to track and manage the presence of this sequence type across different contexts.

  • Constructor Details

    • SequenceType

      public SequenceType(String uid, List<htsjdk.samtools.util.Tuple<Integer,String>> variants)
      Constructs a new SequenceType instance with the specified unique identifier and variants.

      This constructor initializes a SequenceType object with a unique identifier and a list of variants. The variants are provided as a list of Tuple objects, where each tuple contains:

      • The position of the variant (field a of the tuple).
      • The alternate allele of the variant (field b of the tuple).

      The constructor populates the variants field, which is a TreeMap, by iterating through the provided list of tuples. The positions and alternate alleles are extracted from each tuple and added to the map, ensuring that the variants are stored in a sorted order based on their positions.

      Parameters:
      uid - The unique identifier for this sequence type.
      variants - A list of Tuple objects representing the variants associated with this sequence type.
  • Method Details

    • setName

      protected void setName(String name)
      Sets the name of this sequence type.
      Parameters:
      name - The name to set for this sequence type.
    • getName

      public String getName()
      Retrieves the name of this sequence type.
      Returns:
      The name of this sequence type, or null if it has not been set.
    • addOccurrence

      public void addOccurrence(String identifier)
      Adds an occurrence to this sequence type.
      Parameters:
      identifier - The unique identifier of the entity to add as an occurrence.
    • getOccurrence

      public HashSet<String> getOccurrence()
      Retrieves the entities (by their unique identifiers) associated with this sequence type.
      Returns:
      A set of unique identifiers associated with this sequence type.
    • hasOccurrence

      public boolean hasOccurrence(String identifier)
      Checks if this sequence type is associated with an entity by its identifier.
      Parameters:
      identifier - Unique identifier to check for.
      Returns:
      true if the entity is associated with this sequence type, false otherwise.
    • occurrenceAsString

      public String occurrenceAsString()
      Converts the occurrences of this sequence type to a comma-separated string.

      This method joins all unique identifiers stored in the occurrence set into a single string, separated by commas. It uses the delimiter defined in Constants.COMMA.

      Returns:
      A String representation of the occurrences, separated by commas.
    • getVariant

      public String getVariant(int position)
      Retrieves the variant at the specified position associated with this sequence type.

      This method looks up the variant at the given position in the variants map. If a variant exists at the specified position, it returns the corresponding alternate allele. If no variant is found, it returns null.

      Parameters:
      position - The position to retrieve the variant for.
      Returns:
      The alternate allele at the specified position, or null if no variant is present.
    • getVariants

      public NavigableMap<Integer,String> getVariants()
      Retrieves the variants associated with this sequence type.

      This method returns the map of variants that define this sequence type. The map is navigable, with the keys representing the positions of the variants and the values representing the alternate base sequences. The returned map is immutable and reflects the canonical form of the variants.

      Returns:
      A NavigableMap of variants, where the keys are positions and the values are the alternate base sequences.
    • hasVariant

      public boolean hasVariant(int position)
      Checks if this sequence type has a variant at the specified position.
      Parameters:
      position - The position to check for a variant.
      Returns:
      true if a variant exists at the specified position, false otherwise.
    • hasVariant

      public boolean hasVariant(int position, String content)
      Checks if this sequence type has a specific variant at the specified position.
      Parameters:
      position - The position to check for a variant.
      content - The content of the variant to check for.
      Returns:
      true if the specified variant exists at the position, false otherwise.
    • variantsAsString

      public String variantsAsString()
      Converts the variants of this sequence type to a string representation.

      This method uses variantsAsString(Map) to convert the variants map into a string representation in the format (POS0)(ALT0).(POS1)(ALT1)....

      Returns:
      A String representation of the variants in the format (POS0)(ALT0).(POS1)(ALT1)....
    • toString

      public String toString()
      Returns a string representation of this sequence type in the format identifier attributes variants.

      The string representation includes:

      • The identifier, which is either the name (if set) or the unique identifier _uid.
      • The attributes of this sequence type, formatted using Attributable.attributesAsString().
      • The variants associated with this sequence type, formatted using variantsAsString().
      Overrides:
      toString in class Object
      Returns:
      A String representing this sequence type in the format identifier attributes variants.
    • variantsAsString

      public static String variantsAsString(Map<Integer,String> variants)
      Converts a map of variants to a string representation.

      This method takes a map of variants, where the keys are positions and the values are alternate alleles. It converts the map into a string representation in the format (POS0)(ALT0).(POS1)(ALT1)....

      Parameters:
      variants - A map of variants, where the keys are positions and the values are alternate alleles.
      Returns:
      A String representation of the variants in the format (POS0)(ALT0).(POS1)(ALT1).....
    • variantsAsString

      public static String variantsAsString(List<htsjdk.samtools.util.Tuple<Integer,String>> variants)
      Converts a list of variants to a string representation.

      This method takes a list of Tuple objects, where each tuple contains a position and an alternate allele. It converts the list into a string representation in the format (POS0)(ALT0).(POS1)(ALT1)....

      Parameters:
      variants - A list of Tuple objects representing the variants.
      Returns:
      A String representation of the variants in the format (POS0)(ALT0).(POS1)(ALT1)....
    • computeLengthVariation

      public static int computeLengthVariation(List<htsjdk.samtools.util.Tuple<Integer,String>> variants)
      Computes the net shift in sequence length caused by variants.

      This method calculates the cumulative effect of insertions and deletions on the sequence length. Each variant is analyzed to determine whether it represents an insertion or a deletion:

      • If the variant is an insertion, its length (number of bases minus one) is added to the net shift.
      • If the variant is a deletion, its length (number of bases minus one) is subtracted from the net shift.
      • Other types of variants do not affect the net shift.
      Parameters:
      variants - A list of Tuple objects, where each tuple contains:
      • a: The position of the variant (not used in this method).
      • b: The alternate allele of the variant.
      Returns:
      The net shift in sequence length as an int.
    • getFastaHeader

      public String getFastaHeader(String featureName, String sequenceIdentifier)
      Generates a FASTA header for the sequence type.

      This method constructs a FASTA header string for the sequence type using the provided feature name and sequence identifier. The header includes an identifier in the format lcl|<featureName>_<sequenceIdentifier>. Additionally, it appends optional properties to the header if they are present as attributes:

      • allelic_frequency: The allelic frequency of the sequence type.
      • so_effects: Sequence ontology effects associated with the sequence type.
      Parameters:
      featureName - The name of the feature to include in the FASTA header.
      sequenceIdentifier - The identifier of the sequence to include in the FASTA header.
      Returns:
      A String representing the FASTA header, including the identifier and optional properties.