Package utility

Class Clustering

java.lang.Object
utility.Clustering

public final class Clustering extends Object
Provides functionality for clustering data using the HDBSCAN algorithm (see https://tribuo.org/).

The static implementation allows to re-use the trainer and model and provides methods to add samples (not Sample instances, but data points), training and resetting the model, as well as retrieving clustering results.

The implementation currently only supports the clustering based on variants, i.e. each feature is defined by a position and alternative base in the context of Feature.Alleles, Feature.Proteoforms or Samples and considered a binary feature (weight or value of 1.0). These are compared with a Manhattan distanc (L1 distance).

  • Constructor Details

    • Clustering

      public Clustering()
  • Method Details

    • reset

      public static void reset()
      Resets the clustering state by clearing the dataset, labels, and model.
    • addToDataset

      public static void addToDataset(String label, Map<Integer,String> variants)
      Adds a sample to the dataset for clustering.
      Parameters:
      label - The label of the sample.
      variants - A map of feature positions and their corresponding values.
    • hasData

      public static boolean hasData()
      Checks if the dataset contains any samples.
      Returns:
      true if the dataset has at least one sample, false otherwise.
    • train

      public static void train()
      Trains the HDBSCAN model using the current dataset.
    • getClusteringResult

      public static List<Clustering.Entry> getClusteringResult()
      Retrieves the clustering results as a list of Clustering.Entry instances.
      Returns:
      A list of clustering result entries, each containing the name, label, index, and outlier score.