Package op

Class StorageUpdater

java.lang.Object
op.StorageUpdater

public class StorageUpdater extends Object
Encapsulates complex update logic for genomic data storage from the Storage class itself.

This class provides methods to update sample attributes, process variant calls from Sample instances into Variants in stored Contigs, calculate sequence types, and generate statistics.

The single update methods should be called only once the relevant data is loaded into the Storage instance, e.g. updateSequenceTypes() without prior processing of VCF files will not have any effect.

  • Constructor Details

    • StorageUpdater

      public StorageUpdater(Storage storage)
      Constructs a new StorageUpdater instance.

      This constructor initializes the StorageUpdater with the provided storage object, which is used to manage genomic data and perform updates on samples, features, and contigs.

      Parameters:
      storage - The storage object that contains genomic data to be updated.
  • Method Details

    • updateSampleAttributes

      public void updateSampleAttributes(Map<String,Map<String,String>> attributes)
      Updates the attributes of samples in the storage.

      This method iterates through the provided map of attributes, where each entry consists of a sample identifier and a map of attributes. If the sample exists in the storage, it adds the attributes to the sample only if they are not already present.

      This method should only ba called after VCFProcessor.processFiles() was called in the context of the build or expand tasks.

      Parameters:
      attributes - A map where the key is the sample identifier, and the value is another map containing attribute key-value pairs to be added to the sample.
    • updateSequenceTypes

      public void updateSequenceTypes() throws IOException, MusialException
      Updates the sequence types for all features and samples in the storage.

      This method performs the following tasks:

      • Iterates through all features and samples in the storage.
      • Updates alleles for each feature and sample based on variants.
      • Calculates sequence effects such as frameshifts and updates allele attributes.
      • If the feature is coding and the contig has a sequence, updates proteoforms.
      • Handles proteoform sequence alignment, variant extraction, and effect annotation.

      This method should only ba called after VCFProcessor.processFiles() and a respective annotation method implemented in VariantAnnotator was called in the context of the build or expand tasks.

      Throws:
      IOException - If an I/O error occurs during processing.
      MusialException - If a specific error related to the Musial library occurs.
    • updateStatistics

      public void updateStatistics()
      Updates statistical attributes for samples, contigs, and features in the storage.

      This method calculates and updates various statistics, including:

      • Number of calls, filtered calls, mean coverage, and mean entropy for each sample.
      • Frequency of reference alleles and disrupted coding features for each sample.
      • Variant frequencies for each contig and sample-specific variant counts.
      • Allelic frequencies, proteoform frequencies, and disrupted proteoform frequencies for each feature.

      This method should only ba called as the last step before serializing a Storage instance.