Package datastructure

Class Sample


public class Sample extends Attributable
Represents a sample containing variant calls from a single biological sample.

This class extends Attributable to inherit functionality for managing attributes. It provides fields and methods to store and manipulate variant calls, alleles, and other sample-specific data. Each instance of this class is uniquely identified by its name.

  • Field Details Link icon

    • name Link icon

      public final String name
      The name or internal identifier of this sample.

      This field uniquely identifies the sample within the context of the application. It is a final field, meaning its value is immutable once assigned during the construction of the Sample instance.

    • variantCalls Link icon

      protected final HashMap<String,TreeMap<Integer,String>> variantCalls
      Hierarchical map structure to store variant calls.

      This map organizes variant calls in a hierarchical structure:

      • First level: The key is the name of the contig (Contig.name).
      • Second level: The key is the position of the variant on the contig.
      • Third level: The value is a string representing the variant call, formatted as: CALL_INDEX;DP;GQ;REF_0:ALT_0:AD_0:PL_0,....
        • CALL_INDEX: Indicates the numeric index of the alternative allele with an optional prefix character of either f (low frequency) or x (low coverage).
        • DP: The read depth at the variant site.
        • GQ: The genotype quality score.
        • REF_0:.:AD_0:PL_0: The reference allele, a placeholder (`.`), the allele depth, and the phred-scaled likelihood.
        • REF_1:ALT_1:AD_1:PL_1,...: One or more alternate alleles, each with their respective reference allele, alternate allele, allele depth, and phred-scaled likelihoods.
        The format follows the VCFv4.2 specification.
    • alleles Link icon

      protected final Map<String,String> alleles
      A map that assigns features to their corresponding alleles.

      This Map stores the relationship between feature names and their associated allele identifiers. The keys represent the names of the features, and the values represent the unique identifiers of the alleles. This structure is used to track which allele is associated with each feature in the sample.

    • variantCallPattern Link icon

      public static final Pattern variantCallPattern
      Regular expression pattern to match variant call strings.

      This pattern is designed to parse variant call strings that conform to the VCF specification. The expected format includes fields separated by semicolons (`;`), with the following structure:

      • CALL_INDEX: An optional prefix indicating the call index, which can be `f` (low frequency), `x` (low coverage), or a numeric index of the alternative allele.
      • DP: The read depth at the variant site.
      • GQ: The genotype quality score.
      • REF_0:.:AD_0:PL_0: The reference allele, a placeholder (`.`), the allele depth, and the phred-scaled likelihood.
      • REF_1:ALT_1:AD_1:PL_1,...: One or more alternate alleles, each with their respective reference allele, alternate allele, allele depth, and phred-scaled likelihoods.
       Example match: 1;13;99;TTC:.:0:585,TTC:T--:13:0
       
  • Constructor Details Link icon

    • Sample Link icon

      protected Sample(String name, int capacity)
      Constructs a new Sample instance with the specified name and initial capacity for the alleles map.

      This constructor initializes a Sample object with the given name and allocates a HashMap for the alleles field with the specified initial capacity. The name field is set to the provided name, and the superclass constructor is invoked to initialize inherited properties.

      Parameters:
      name - The name of the sample, used as its unique identifier.
      capacity - The expected initial capacity of the alleles map.
  • Method Details Link icon

    • setAllele Link icon

      protected void setAllele(String featureName, String alleleUid)
      Associates a specific allele with a feature in this sample.

      This method updates the alleles map by setting the sequence type (allele) for the specified feature. The feature is identified by its name, and the allele is identified by its unique identifier.

      Parameters:
      featureName - The name of the feature (Feature.name) to associate with the allele.
      alleleUid - The unique identifier of the allele (SequenceType.name) to set for the feature.
    • getAlleles Link icon

      public Collection<Map.Entry<String,String>> getAlleles()
      Retrieves the entries of the alleles map for this sample.

      This method returns a collection view of the mappings contained in the alleles map. Each entry in the collection represents a feature name and its associated allele identifier. Modifications to the returned collection will reflect in the underlying map.

      Returns:
      A Collection of Map.Entry objects representing the entries in the alleles map.
    • getAlleleCount Link icon

      public int getAlleleCount()
      Retrieves the number of alleles in this sample.

      This method returns the size of the alleles map, which represents the number of unique alleles associated with features in this sample. This corresponds to the number of non-reference alleles present in the sample.

      Returns:
      The number of alleles in this sample.
    • getVariantCalls Link icon

      public TreeMap<Integer,String> getVariantCalls(String contig)
      Retrieves the variant calls for the specified contig in this sample.

      This method returns a TreeMap containing the variant calls for the given contig. The keys in the map represent the positions of the variants on the contig, and the values are the corresponding variant call strings. If no variant calls exist for the specified contig, an empty TreeMap is returned.

      Parameters:
      contig - The name of the contig to retrieve the variant calls for.
      Returns:
      A TreeMap where the keys are variant positions and the values are variant call strings.
    • getReferenceOfCall Link icon

      public static String getReferenceOfCall(String call)
      Extracts the reference base character from the starting position of a variant call string.

      This method processes a variant call string formatted as per the VCF specification and retrieves the reference base character from the starting position. The call string is expected to follow the structure defined in variantCallPattern, where fields are separated by semicolons, commas, and colons.

       Example call string: 1;13;99;TTC:.:0:585,TTC:T--:13:0
       
      Parameters:
      call - The variant call string to process.
      Returns:
      The reference base character from the starting position of the specified call.
      Throws:
      ArrayIndexOutOfBoundsException - If the call string does not conform to the expected format.
    • toString Link icon

      public String toString()
      Converts this sample to its string representation.

      This method generates a string representation of the sample, including its name and attributes. The attributes are formatted as key-value pairs separated by an equals sign (`=`) and delimited by semicolons (`;`). If the last character of the generated string is a semicolon, it is removed to ensure proper formatting.

      Overrides:
      toString in class Object
      Returns:
      A String representing the sample, including its name and attributes.