Class IO
This final class provides static methods for various file and data handling operations, such as reading, writing, compressing, and hashing files and strings. It also includes methods for generating specific file formats like VCF, FASTA, and GFF.
The class is designed to be non-instantiable and serves as a collection of utility methods.
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
copyResourceToFile
(String resourceName, Path targetPath) Copies a resource from the application's classpath to a specified targetPath
.static String
detectSeparator
(List<String> content) Detects the separator used in a list of strings.static String
generateGffContent
(Storage storage) Generates the content of a GFF (General Feature Format) file from the givenStorage
object.static String
generateReferenceFastaContent
(Storage storage) Generates the content of a reference FASTA file from the givenStorage
object.static String
generateVcfContent
(ArrayList<htsjdk.samtools.util.Tuple<org.apache.commons.lang3.tuple.Triple<String, Integer, String>, VariantInformation>> variants) Generates the content of a plain VCF (Variant Call Format) file.static String
gzipCompress
(String content) Compresses a string using GZIP compression and encodes the result in Base64.static String
gzipDecompress
(String content) Decompresses a Base64-encoded GZIP-compressed string.static htsjdk.variant.vcf.VCFFileReader
initializeVCFFileReader
(File file) Initializes aVCFFileReader
instance for the passed VCF file.static String
Generates the MD5 hash of the given string.Reads the content of a file line by line and returns a list of non-empty, trimmed lines.Reads a tabular file and converts its content into a nested map structure.static void
Writes the specified content to a file at the given path.
-
Constructor Details
-
IO
public IO()
-
-
Method Details
-
readFile
Reads the content of a file line by line and returns a list of non-empty, trimmed lines.This method uses a
Scanner
to read the file with UTF-8 encoding. Each line is trimmed to remove leading and trailing whitespace, and empty lines are excluded from the result.- Parameters:
file
- TheFile
object representing the file to read.- Returns:
- An
ArrayList
containing the non-empty, trimmed lines from the file. - Throws:
IOException
- If an I/O error occurs while reading the file.
-
writeFile
Writes the specified content to a file at the given path.This method ensures that the parent directories of the target file are created if they do not exist. It then writes the provided content to the file using UTF-8 encoding. If the file already exists, its content is overwritten.
- Parameters:
path
- ThePath
where the file will be written.content
- TheString
content to write to the file.- Throws:
IOException
- If an I/O error occurs during directory creation or file writing.
-
readTabularFileAsNestedMap
public static HashMap<String,HashMap<String, readTabularFileAsNestedMapString>> (File file) throws IOException Reads a tabular file and converts its content into a nested map structure.This method reads a tabular file where the first row contains headers and each subsequent row contains data. The first column is treated as the key for the outer map, and the remaining columns are stored in an inner map with their corresponding headers as keys. The file can use tab or comma as delimiters.
- Parameters:
file
- TheFile
object representing the tabular file to read.- Returns:
- A
HashMap
where the outer map's key is the first column's value, and the value is anotherHashMap
containing the remaining columns as key-value pairs. - Throws:
IOException
- If an I/O error occurs or the file format is invalid.
-
detectSeparator
Detects the separator used in a list of strings.This method analyzes the provided list of strings to determine the separator used in the content. It skips lines that start with a specific sign (defined by
Constants.SIGN
) and checks the first non-skipped line for the presence of either a tab character or a comma. If a tab is found, it returns the tab separator; if a comma is found, it returns the comma separator. If neither is found, it returns an empty string. -
generateVcfContent
public static String generateVcfContent(ArrayList<htsjdk.samtools.util.Tuple<org.apache.commons.lang3.tuple.Triple<String, Integer, String>, VariantInformation>> variants) Generates the content of a plain VCF (Variant Call Format) file.This method constructs a VCF file content as a
String
from a list of variants. The VCF content includes the file format, source, and a header line, followed by the variant data. Each variant is represented by its chromosome, position, reference base, and alternate base.The generated VCF content follows the VCFv4.3 specification and includes the following fields:
- CHROM: Chromosome name
- POS: Position of the variant
- ID: Variant identifier (set to ".")
- REF: Reference base(s)
- ALT: Alternate base(s) (gaps are stripped)
- QUAL: Quality score (set to "100")
- FILTER: Filter status (set to ".")
- INFO: Additional information (empty)
- Parameters:
variants
- A list ofTuple
objects, where each tuple contains:- A
Triple
with the chromosome name, position, and alternate base. - A
VariantInformation
object containing the reference base.
- A
- Returns:
- A
String
representing the VCF content.
-
generateReferenceFastaContent
Generates the content of a reference FASTA file from the givenStorage
object.This method constructs a FASTA file content as a
String
by iterating over the contigs in the providedStorage
object. Each contig's name is used as the header (prefixed with '>'), and its sequence is split into lines of 80 characters for proper FASTA formatting.- Parameters:
storage
- TheStorage
object containing the contigs and their sequences.- Returns:
- A
String
representing the content of the reference FASTA file. - Throws:
IOException
- If an I/O error occurs during the generation of the FASTA content.IllegalArgumentException
- If no reference sequence information is stored in theStorage
object.
-
generateGffContent
Generates the content of a GFF (General Feature Format) file from the givenStorage
object.This method constructs a GFF file content as a
String
by iterating over the features in the providedStorage
object. The GFF content includes the version, processor information, and the feature data. Each feature is converted to its GFF string representation using theFeature.toGffString()
method.The generated GFF content follows the GFF3 specification and includes the following:
- ##gff-version: Specifies the GFF version.
- ##processor: Includes the software name and version used to generate the file.
- Feature data: Each feature is represented in GFF format.
-
initializeVCFFileReader
public static htsjdk.variant.vcf.VCFFileReader initializeVCFFileReader(File file) throws IOException Initializes aVCFFileReader
instance for the passed VCF file. For this, a temporary indexed VCF file is created.- Parameters:
file
- AFile
object pointing to a .vcf file.- Returns:
- A
VCFFileReader
instance for the passed .vcf file. - Throws:
IOException
- In case of an error during the initialization of the VCFFileReader.
-
copyResourceToFile
Copies a resource from the application's classpath to a specified targetPath
.This method retrieves a resource as an
InputStream
from the application's classpath using the specified resource path. The resource is then copied to the target file path, overwriting any existing file at the target location.- Parameters:
resourceName
- The path to the resource within the application's classpath.targetPath
- The file path where the resource should be copied.- Throws:
MusialException
- If the resource cannot be found or an I/O error occurs during the copy operation.
-
gzipCompress
Compresses a string using GZIP compression and encodes the result in Base64.This method compresses the input string using the GZIP algorithm and then encodes the compressed byte array into a Base64 string. The method ensures proper resource management by using a try-with-resources block for the output streams.
- Parameters:
content
- TheString
to be compressed.- Returns:
- A Base64-encoded
String
representing the GZIP-compressed content. - Throws:
IOException
- If an I/O error occurs during compression.
-
gzipDecompress
Decompresses a Base64-encoded GZIP-compressed string.This method decodes the input string from Base64, decompresses the resulting GZIP-compressed data, and returns the decompressed content as a string. It uses a buffer to read the decompressed data in chunks and appends it to a
StringBuilder
.- Parameters:
content
- The Base64-encoded GZIP-compressed string to decompress.- Returns:
- A
String
containing the decompressed content. - Throws:
IOException
- If an I/O error occurs during decompression.
-
md5Hash
Generates the MD5 hash of the given string.This method computes the MD5 hash of the input string and returns it as a hexadecimal string. It uses the
DigestUtils.md5Hex(String)
method from the Apache Commons Codec library to perform the hashing.
-