Package op

Class FeatureLoader

java.lang.Object
op.FeatureLoader

public class FeatureLoader extends Object
The FeatureLoader class is responsible for loading and validating genomic features.

This class provides methods to load features from a GFF3 annotation file into the storage system, validate the features against Sequence Ontology (SO) hierarchy rules, and adjust or impute features as necessary. It interacts with the Storage object to manage contigs, features, samples, and variant calls.

The FeatureLoader is initialized with a Storage object, a FeatureList containing parsed features, and a map of user-defined feature specifications. It ensures that the genomic data is processed and stored in a consistent and hierarchical manner.

  • Constructor Details

    • FeatureLoader

      public FeatureLoader(Storage storage, org.biojava.nbio.genome.parsers.gff.FeatureList featureList, Map<String,Map<String,String>> features)
      Constructs a new instance of the FeatureLoader class.

      This constructor initializes the FeatureLoader with the provided storage, feature list, and feature specifications. The FeatureLoader is responsible for loading and validating genomic features based on the given data.

      Parameters:
      storage - The Storage object used to manage genomic data, including contigs, features, samples, and variant calls.
      featureList - The FeatureList containing features parsed from the GFF3 annotation file.
      features - A map of feature specifications provided by the user, where each key is a feature identifier and the value is a map of attributes defining the feature's properties and matching criteria.
  • Method Details

    • loadFeatures

      public void loadFeatures() throws MusialException
      Loads features into the storage based on the provided annotations and specifications.

      This method validates the input conditions to ensure that the reference sequence and annotations are properly specified. It processes the features specified in the CLI or loads all annotated features from the provided GFF3 file.

      The method performs the following steps:

      • Validates the presence of a reference sequence and annotations.
      • Processes attributes to ensure they are in the correct format.
      • Matches and adds specified features to the storage.
      • Loads all annotated features if no specific features are provided.
      Throws:
      MusialException - If the input conditions are invalid or if a feature is missing required specifications.
    • validateFeatures

      public void validateFeatures() throws MusialException
      Validates the features stored in the storage to ensure compliance with Sequence Ontology (SO) hierarchy rules.

      This method iterates through all features in the storage and performs the following validations and adjustments:

      • Removes children for features of level 0 SO term types, as they are not allowed to have children.
      • Ensures that only one level 1 SO term exists for a feature or its children.
      • Ensures that only one level 2 SO term exists for a feature or its children.
      • Adjusts features with SO levels greater than 1 to type "gene" to maintain a consistent hierarchy.
      • Imputes missing children based on the location ranges of existing sub-features.

      Features that violate the rules are either adjusted or removed from the storage, and appropriate warnings are logged.

      Throws:
      MusialException - If an error occurs during the adjustment of features.
    • getLoadedFeatureCount

      public long getLoadedFeatureCount()
      Retrieves the total number of features successfully processed, including sub-features.
      Returns:
      The count of processed features.