Class FeatureLoader
This class provides methods to load features from a GFF3 annotation file into the storage system, validate the features against Sequence
Ontology (SO) hierarchy rules, and adjust or impute features as necessary. It interacts with the Storage
object to manage
contigs, features, samples, and variant calls.
The FeatureLoader is initialized with a Storage
object, a FeatureList
containing parsed features, and a map of
user-defined feature specifications. It ensures that the genomic data is processed and stored in a consistent and hierarchical manner.
-
Constructor Summary
ConstructorsConstructorDescriptionFeatureLoader
(Storage storage, org.biojava.nbio.genome.parsers.gff.FeatureList featureList, Map<String, Map<String, String>> features) Constructs a new instance of theFeatureLoader
class. -
Method Summary
Modifier and TypeMethodDescriptionlong
Retrieves the total number of features successfully processed, including sub-features.void
Loads features into the storage based on the provided annotations and specifications.void
Validates the features stored in the storage to ensure compliance with Sequence Ontology (SO) hierarchy rules.
-
Constructor Details
-
FeatureLoader
public FeatureLoader(Storage storage, org.biojava.nbio.genome.parsers.gff.FeatureList featureList, Map<String, Map<String, String>> features) Constructs a new instance of theFeatureLoader
class.This constructor initializes the
FeatureLoader
with the provided storage, feature list, and feature specifications. TheFeatureLoader
is responsible for loading and validating genomic features based on the given data.- Parameters:
storage
- TheStorage
object used to manage genomic data, including contigs, features, samples, and variant calls.featureList
- TheFeatureList
containing features parsed from the GFF3 annotation file.features
- A map of feature specifications provided by the user, where each key is a feature identifier and the value is a map of attributes defining the feature's properties and matching criteria.
-
-
Method Details
-
loadFeatures
Loads features into the storage based on the provided annotations and specifications.This method validates the input conditions to ensure that the reference sequence and annotations are properly specified. It processes the features specified in the CLI or loads all annotated features from the provided GFF3 file.
The method performs the following steps:
- Validates the presence of a reference sequence and annotations.
- Processes attributes to ensure they are in the correct format.
- Matches and adds specified features to the storage.
- Loads all annotated features if no specific features are provided.
- Throws:
MusialException
- If the input conditions are invalid or if a feature is missing required specifications.
-
validateFeatures
Validates the features stored in the storage to ensure compliance with Sequence Ontology (SO) hierarchy rules.This method iterates through all features in the storage and performs the following validations and adjustments:
- Removes children for features of level 0 SO term types, as they are not allowed to have children.
- Ensures that only one level 1 SO term exists for a feature or its children.
- Ensures that only one level 2 SO term exists for a feature or its children.
- Adjusts features with SO levels greater than 1 to type "gene" to maintain a consistent hierarchy.
- Imputes missing children based on the location ranges of existing sub-features.
Features that violate the rules are either adjusted or removed from the storage, and appropriate warnings are logged.
- Throws:
MusialException
- If an error occurs during the adjustment of features.
-
getLoadedFeatureCount
public long getLoadedFeatureCount()Retrieves the total number of features successfully processed, including sub-features.- Returns:
- The count of processed features.
-