Package op

Class StorageIO

java.lang.Object
op.StorageIO

public class StorageIO extends Object
The StorageIO class provides utility methods for serializing and deserializing genomic data.

This class includes methods to convert Storage objects into various file formats such as JSON, GFF3, FASTA, and VCF. It handles the generation of file content based on the data stored in the Storage object, ensuring compliance with the respective file format specifications. Additionally, it provides helper methods for processing features and contigs.

  • Method Details

    • toJSON

      public static void toJSON(Storage storage, Path path) throws IOException
      Serializes the given Storage object to a JSON file at the specified path.

      This method converts the Storage object into a JSON string using the Gson library. If the specified path does not end with ".json" or ".json.gz", the default output extension defined in Musial is appended to the path. The JSON data is then written to the file.

      If the path ends with ".gz", the JSON data is compressed using GZIP before being written. Otherwise, it is written as plain text.

      Parameters:
      storage - The Storage object to be serialized.
      path - The Path where the JSON file will be written.
      Throws:
      IOException - If an I/O error occurs during file writing.
    • toGFF3

      public static String toGFF3(Storage storage)
      Generates the content of a GFF (General Feature Format) file from the given Storage object.

      This method constructs a GFF file content as a String by iterating over the features in the provided Storage object. The GFF content includes the version, processor information, and the feature data. Each feature is converted to its GFF string representation using the featureToGFF3String(Feature) method.

      The generated GFF content follows the GFF3 specification and includes the following:

      • ##gff-version: Specifies the GFF version.
      • ##processor: Includes the software id and version used to generate the file.
      • Feature data: Each feature is represented in GFF format.
      Parameters:
      storage - The Storage object containing the features to include in the GFF file.
      Returns:
      A String representing the GFF file content.
    • toFASTA

      public static String toFASTA(Storage storage) throws IOException
      Generates the content of a FASTA file from the given Storage object.

      This method constructs a FASTA file content as a String by iterating over the contigs in the provided Storage object. Each contig's ID is used as the header (prefixed with '>'), and its sequence is split into lines of 80 characters for proper FASTA formatting. The method ensures that all contigs in the storage have sequence data before proceeding.

      Parameters:
      storage - The Storage object containing the contigs and their sequences.
      Returns:
      A String representing the content of the reference FASTA file.
      Throws:
      IOException - If an I/O error occurs during the generation of the FASTA content.
      IllegalArgumentException - If no reference sequence information is stored in the Storage object.
    • toVCF

      public static String toVCF(Storage storage, boolean onlyNovel, boolean excludeAmbiguous)
      Generates the content of a VCF (Variant Call Format) file from the given Storage object.

      This method constructs a VCF file content as a String by iterating over the contigs in the provided Storage object. The VCF content includes the file format, source, and a header line, followed by the variant data. Each variant is represented by its chromosome, position, reference base, and alternate base.

      The generated VCF content follows the VCFv4.3 specification and includes the following fields:

      • CHROM: Chromosome identifier.
      • POS: Position of the variant on the chromosome.
      • ID: Variant identifier (set to ".").
      • REF: Reference base(s) (gaps are stripped).
      • ALT: Alternate base(s) (gaps are stripped).
      • QUAL: Quality score (set to "100").
      • FILTER: Filter status (set to ".").
      • INFO: Additional information (see parameters).

      Variants can be filtered based on their novelty and ambiguity:

      • If onlyNovel is true, only active variants are included.
      • If excludeAmbiguous is true, variants with ambiguous alternate bases are excluded.
      Parameters:
      storage - The Storage object containing the contigs and variants.
      onlyNovel - If true, only active variants are included in the VCF content.
      excludeAmbiguous - If true, variants with ambiguous alternate bases are excluded.
      Returns:
      A String representing the VCF file content.