Package utility

Class IO

java.lang.Object
utility.IO

public final class IO extends Object
Utility class for input/output operations.

This final class provides static methods for various file and data handling operations, such as reading, writing, compressing, and hashing files and strings. It also includes methods for generating specific file formats like VCF, FASTA, and GFF.

The class is designed to be non-instantiable and serves as a collection of utility methods.

  • Constructor Details

    • IO

      public IO()
  • Method Details

    • readFile

      public static ArrayList<String> readFile(File file) throws IOException
      Reads the content of a file line by line and returns a list of non-empty, trimmed lines.

      This method uses a Scanner to read the file with UTF-8 encoding. Each line is trimmed to remove leading and trailing whitespace, and empty lines are excluded from the result.

      Parameters:
      file - The File object representing the file to read.
      Returns:
      An ArrayList containing the non-empty, trimmed lines from the file.
      Throws:
      IOException - If an I/O error occurs while reading the file.
    • writeFile

      public static void writeFile(Path path, String content) throws IOException
      Writes the specified content to a file at the given path.

      This method ensures that the parent directories of the target file are created if they do not exist. It then writes the provided content to the file using UTF-8 encoding. If the file already exists, its content is overwritten.

      Parameters:
      path - The Path where the file will be written.
      content - The String content to write to the file.
      Throws:
      IOException - If an I/O error occurs during directory creation or file writing.
    • readTabularFileAsNestedMap

      public static HashMap<String,HashMap<String,String>> readTabularFileAsNestedMap(File file) throws IOException
      Reads a tabular file and converts its content into a nested map structure.

      This method reads a tabular file where the first row contains headers and each subsequent row contains data. The first column is treated as the key for the outer map, and the remaining columns are stored in an inner map with their corresponding headers as keys. The file can use tab or comma as delimiters.

      Parameters:
      file - The File object representing the tabular file to read.
      Returns:
      A HashMap where the outer map's key is the first column's value, and the value is another HashMap containing the remaining columns as key-value pairs.
      Throws:
      IOException - If an I/O error occurs or the file format is invalid.
    • detectSeparator

      public static String detectSeparator(List<String> content)
      Detects the separator used in a list of strings.

      This method analyzes the provided list of strings to determine the separator used in the content. It skips lines that start with a specific sign (defined by Constants.SIGN) and checks the first non-skipped line for the presence of either a tab character or a comma. If a tab is found, it returns the tab separator; if a comma is found, it returns the comma separator. If neither is found, it returns an empty string.

      Parameters:
      content - A List of String objects representing the content to analyze.
      Returns:
      A String representing the detected separator: either a tab, a comma, or an empty string if no separator is found.
    • generateVcfContent

      public static String generateVcfContent(ArrayList<htsjdk.samtools.util.Tuple<org.apache.commons.lang3.tuple.Triple<String,Integer,String>,VariantInformation>> variants)
      Generates the content of a plain VCF (Variant Call Format) file.

      This method constructs a VCF file content as a String from a list of variants. The VCF content includes the file format, source, and a header line, followed by the variant data. Each variant is represented by its chromosome, position, reference base, and alternate base.

      The generated VCF content follows the VCFv4.3 specification and includes the following fields:

      • CHROM: Chromosome name
      • POS: Position of the variant
      • ID: Variant identifier (set to ".")
      • REF: Reference base(s)
      • ALT: Alternate base(s) (gaps are stripped)
      • QUAL: Quality score (set to "100")
      • FILTER: Filter status (set to ".")
      • INFO: Additional information (empty)
      Parameters:
      variants - A list of Tuple objects, where each tuple contains:
      • A Triple with the chromosome name, position, and alternate base.
      • A VariantInformation object containing the reference base.
      Returns:
      A String representing the VCF content.
    • generateReferenceFastaContent

      public static String generateReferenceFastaContent(Storage storage) throws IOException
      Generates the content of a reference FASTA file from the given Storage object.

      This method constructs a FASTA file content as a String by iterating over the contigs in the provided Storage object. Each contig's name is used as the header (prefixed with '>'), and its sequence is split into lines of 80 characters for proper FASTA formatting.

      Parameters:
      storage - The Storage object containing the contigs and their sequences.
      Returns:
      A String representing the content of the reference FASTA file.
      Throws:
      IOException - If an I/O error occurs during the generation of the FASTA content.
      IllegalArgumentException - If no reference sequence information is stored in the Storage object.
    • generateGffContent

      public static String generateGffContent(Storage storage)
      Generates the content of a GFF (General Feature Format) file from the given Storage object.

      This method constructs a GFF file content as a String by iterating over the features in the provided Storage object. The GFF content includes the version, processor information, and the feature data. Each feature is converted to its GFF string representation using the Feature.toGffString() method.

      The generated GFF content follows the GFF3 specification and includes the following:

      • ##gff-version: Specifies the GFF version.
      • ##processor: Includes the software name and version used to generate the file.
      • Feature data: Each feature is represented in GFF format.
      Parameters:
      storage - The Storage object containing the features to include in the GFF file.
      Returns:
      A String representing the GFF file content.
    • initializeVCFFileReader

      public static htsjdk.variant.vcf.VCFFileReader initializeVCFFileReader(File file) throws IOException
      Initializes a VCFFileReader instance for the passed VCF file. For this, a temporary indexed VCF file is created.
      Parameters:
      file - A File object pointing to a .vcf file.
      Returns:
      A VCFFileReader instance for the passed .vcf file.
      Throws:
      IOException - In case of an error during the initialization of the VCFFileReader.
    • copyResourceToFile

      public static void copyResourceToFile(String resourceName, Path targetPath) throws MusialException
      Copies a resource from the application's classpath to a specified target Path.

      This method retrieves a resource as an InputStream from the application's classpath using the specified resource path. The resource is then copied to the target file path, overwriting any existing file at the target location.

      Parameters:
      resourceName - The path to the resource within the application's classpath.
      targetPath - The file path where the resource should be copied.
      Throws:
      MusialException - If the resource cannot be found or an I/O error occurs during the copy operation.
    • gzipCompress

      public static String gzipCompress(String content) throws IOException
      Compresses a string using GZIP compression and encodes the result in Base64.

      This method compresses the input string using the GZIP algorithm and then encodes the compressed byte array into a Base64 string. The method ensures proper resource management by using a try-with-resources block for the output streams.

      Parameters:
      content - The String to be compressed.
      Returns:
      A Base64-encoded String representing the GZIP-compressed content.
      Throws:
      IOException - If an I/O error occurs during compression.
    • gzipDecompress

      public static String gzipDecompress(String content) throws IOException
      Decompresses a Base64-encoded GZIP-compressed string.

      This method decodes the input string from Base64, decompresses the resulting GZIP-compressed data, and returns the decompressed content as a string. It uses a buffer to read the decompressed data in chunks and appends it to a StringBuilder.

      Parameters:
      content - The Base64-encoded GZIP-compressed string to decompress.
      Returns:
      A String containing the decompressed content.
      Throws:
      IOException - If an I/O error occurs during decompression.
    • md5Hash

      public static String md5Hash(String content)
      Generates the MD5 hash of the given string.

      This method computes the MD5 hash of the input string and returns it as a hexadecimal string. It uses the DigestUtils.md5Hex(String) method from the Apache Commons Codec library to perform the hashing.

      Parameters:
      content - The String to hash.
      Returns:
      A String representing the MD5 hash of the input content in hexadecimal format.