Package op

Class VCFProcessor

java.lang.Object
op.VCFProcessor
All Implemented Interfaces:
Closeable, AutoCloseable

public class VCFProcessor extends Object implements Closeable
The VCFProcessor class is responsible for processing Variant Call Format (VCF) files.

This class handles the analysis of VCF files, including the extraction of variant data, imputation of contigs, and integration of the processed data into the storage system. It provides methods to analyze VCF files, process variant contexts, and track statistics such as the number of processed, ignored, and filtered variant calls.

  • Constructor Details

    • VCFProcessor

      public VCFProcessor(List<Path> paths, Storage storage, boolean imputeContigs)
      Constructs a new VCFProcessor instance for processing VCF files.

      This constructor initializes the processor with the specified list of VCF file paths, a storage object for managing genomic data, and a flag indicating whether to impute contigs from the VCF files. The imputeContigs flag determines if contigs should be inferred and added to the storage based on the VCF data.

      Parameters:
      paths - A List of Path objects representing the file paths to the VCF files to be processed.
      storage - The Storage object used to store and manage processed genomic data.
      imputeContigs - A boolean flag indicating whether to infer and add contigs from the VCF files to the storage.
  • Method Details

    • close

      public void close()
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
    • processFiles

      public void processFiles() throws IOException
      Analyzes the provided VCF files and processes their variant data.

      This method iterates through the list of VCF file paths, creates temporary indexed VCF files, and processes their variant contexts. If the `imputeContigs` flag is set, it infers contigs from the VCF files and adds them to the storage. The method processes variant contexts either for specific features in the storage or for all variants if no features are defined.

      Throws:
      IOException - If an I/O error occurs during file operations or VCF processing.
    • getProcessedCallsCount

      public long getProcessedCallsCount()
      Retrieves the total number of processed variant calls.
      Returns:
      The total number of processed variant calls as a long.
    • getRealignedCallsCount

      public long getRealignedCallsCount()
      Retrieves the total number of realigned variant calls.
      Returns:
      The total number of realigned variant calls as a long.
    • getIgnoredCallsCount

      public long getIgnoredCallsCount()
      Retrieves the total number of ignored variant calls.

      This method returns the count of variant calls that were ignored during processing. A variant call may be ignored for reasons such as missing data, being classified as a reference call, or lacking sufficient information for analysis.

      Returns:
      The total number of ignored variant calls as a long.
    • getFilteredCallsCount

      public long getFilteredCallsCount()
      Retrieves the total number of filtered variant calls.

      This method returns the count of variant calls that were filtered out during processing. Filtering may occur due to criteria such as low coverage, low frequency, or other conditions defined in the program.

      Returns:
      The total number of filtered variant calls as a long.
    • getSamples

      public Collection<String> getSamples()
      Retrieves an unmodifiable collection of sample identifiers.

      This method returns a collection of all sample identifiers currently indexed in the variant call cache. The returned collection is unmodifiable, ensuring that the underlying data cannot be altered.

      Returns:
      An unmodifiable Collection of String objects representing the sample identifiers.
    • updateVariants

      public void updateVariants()
      Updates the variants in the storage by processing variant calls from the cache.

      This method iterates through all samples and contigs in the cache, processes their variant calls, and adds the resolved variants to the storage. It handles complex cases such as deletions, insertions, and mixed InDels, ensuring that the variants are stored in a canonical format.

    • loadVariantCallsFromStorage

      public int loadVariantCallsFromStorage()
      Loads variant calls from the storage and processes them.

      This method iterates through all contigs and samples in the variant call cache (`vcCache`), retrieves the associated variants, and processes their variant calls. Each variant call string is parsed into a `VariantCall` object and passed to the `processVariantCall` method for further processing.

      The method ensures that only valid contigs and samples present in the storage are processed. It handles the relationship between contigs, samples, and variants, and updates the storage with the processed variant calls.

      Returns:
      The total number of loaded variant calls as an int.