Class StorageIO
StorageIO
class provides utility methods for serializing and deserializing genomic data.
This class includes methods to convert Storage
objects into various file formats such as JSON, GFF3, FASTA, and VCF. It handles
the generation of file content based on the data stored in the Storage
object, ensuring compliance with the respective file
format specifications. Additionally, it provides helper methods for processing features and contigs.
-
Method Summary
Modifier and TypeMethodDescriptionstatic String
Generates the content of a FASTA file from the givenStorage
object.static String
Generates the content of a GFF (General Feature Format) file from the givenStorage
object.static void
Serializes the givenStorage
object to a JSON file at the specified path.static String
Generates the content of a VCF (Variant Call Format) file from the givenStorage
object.
-
Method Details
-
toJSON
Serializes the givenStorage
object to a JSON file at the specified path.This method converts the
Storage
object into a JSON string using the Gson library. If the specified path does not end with ".json" or ".json.gz", the default output extension defined inMusial
is appended to the path. The JSON data is then written to the file.If the path ends with ".gz", the JSON data is compressed using GZIP before being written. Otherwise, it is written as plain text.
- Parameters:
storage
- TheStorage
object to be serialized.path
- ThePath
where the JSON file will be written.- Throws:
IOException
- If an I/O error occurs during file writing.
-
toGFF3
Generates the content of a GFF (General Feature Format) file from the givenStorage
object.This method constructs a GFF file content as a
String
by iterating over the features in the providedStorage
object. The GFF content includes the version, processor information, and the feature data. Each feature is converted to its GFF string representation using thefeatureToGFF3String(Feature)
method.The generated GFF content follows the GFF3 specification and includes the following:
- ##gff-version: Specifies the GFF version.
- ##processor: Includes the software id and version used to generate the file.
- Feature data: Each feature is represented in GFF format.
-
toFASTA
Generates the content of a FASTA file from the givenStorage
object.This method constructs a FASTA file content as a
String
by iterating over the contigs in the providedStorage
object. Each contig's ID is used as the header (prefixed with '>'), and its sequence is split into lines of 80 characters for proper FASTA formatting. The method ensures that all contigs in the storage have sequence data before proceeding.- Parameters:
storage
- TheStorage
object containing the contigs and their sequences.- Returns:
- A
String
representing the content of the reference FASTA file. - Throws:
IOException
- If an I/O error occurs during the generation of the FASTA content.IllegalArgumentException
- If no reference sequence information is stored in theStorage
object.
-
toVCF
Generates the content of a VCF (Variant Call Format) file from the givenStorage
object.This method constructs a VCF file content as a
String
by iterating over the contigs in the providedStorage
object. The VCF content includes the file format, source, and a header line, followed by the variant data. Each variant is represented by its chromosome, position, reference base, and alternate base.The generated VCF content follows the VCFv4.3 specification and includes the following fields:
- CHROM: Chromosome identifier.
- POS: Position of the variant on the chromosome.
- ID: Variant identifier (set to ".").
- REF: Reference base(s) (gaps are stripped).
- ALT: Alternate base(s) (gaps are stripped).
- QUAL: Quality score (set to "100").
- FILTER: Filter status (set to ".").
- INFO: Additional information (see parameters).
Variants can be filtered based on their novelty and ambiguity:
- If
onlyNovel
istrue
, only active variants are included. - If
excludeAmbiguous
istrue
, variants with ambiguous alternate bases are excluded.
-