utility.Clustering

public final class Clustering extends Object

Provides functionality for clustering data using the HDBSCAN algorithm (see https://tribuo.org/).

The static implementation allows to re-use the trainer and model and provides methods to add samples (not Sample instances, but data points), training and resetting the model, as well as retrieving clustering results.

The implementation currently only supports the clustering based on variants, i.e. each feature is defined by a position and alternative base in the context of Feature.Alleles, Feature.Proteoforms or Samples and considered a binary feature (weight or value of 1.0). These are compared with a Manhattan distanc (L1 distance).

Nested Class Summary

Nested Classes

Modifier and Type

Class

Description

static final record

Clustering.Entry

A record to represent a clustering result entry.
Constructor Summary

Constructors

Constructor

Description

Clustering()
Method Summary

Modifier and Type

Method

Description

static void

addToDataset(String label, Map<Integer,String> variants)

Adds a sample to the dataset for clustering.

static List<Clustering.Entry>

getClusteringResult()

Retrieves the clustering results as a list of Clustering.Entry instances.

static boolean

hasData()

Checks if the dataset contains any samples.

static void

reset()

Resets the clustering state by clearing the dataset, labels, and model.

static void

train()

Trains the HDBSCAN model using the current dataset.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Constructor Details
- Clustering
  
  public Clustering()
Method Details
- reset
  
  public static void reset()
  
  Resets the clustering state by clearing the dataset, labels, and model.
- addToDataset
  
  public static void addToDataset(String label, Map<Integer,String> variants)
  
  Adds a sample to the dataset for clustering.
  
  Parameters:
  
  label - The label of the sample.
  
  variants - A map of feature positions and their corresponding values.
- hasData
  
  public static boolean hasData()
  
  Checks if the dataset contains any samples.
  
  Returns:
  
  true if the dataset has at least one sample, false otherwise.
- train
  
  public static void train()
  
  Trains the HDBSCAN model using the current dataset.
- getClusteringResult
  
  public static List<Clustering.Entry> getClusteringResult()
  
  Retrieves the clustering results as a list of Clustering.Entry instances.
  
  Returns:
  
  A list of clustering result entries, each containing the name, label, index, and outlier score.

Class Clustering

Nested Class Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Constructor Details

Clustering

Method Details

reset

addToDataset

hasData

train

getClusteringResult