Package utility
Class Clustering
java.lang.Object
utility.Clustering
Provides functionality for clustering data using the HDBSCAN algorithm (see https://tribuo.org/).
The static implementation allows to re-use the trainer and model and provides methods to
add samples (not Sample
instances, but data points), training and
resetting the model, as well as retrieving clustering results.
The implementation currently only supports the clustering based on variants, i.e. each
feature is defined by a position and alternative base in the context of Feature.Allele
s,
Feature.Proteoform
s or Sample
s and considered
a binary feature (weight or value of 1.0). These are compared with a Manhattan distanc
(L1 distance).
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic final record
A record to represent a clustering result entry. -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic void
addToDataset
(String label, Map<Integer, String> variants) Adds a sample to the dataset for clustering.static List
<Clustering.Entry> Retrieves the clustering results as a list ofClustering.Entry
instances.static boolean
hasData()
Checks if the dataset contains any samples.static void
reset()
Resets the clustering state by clearing the dataset, labels, and model.static void
train()
Trains the HDBSCAN model using the current dataset.
-
Constructor Details
-
Clustering
public Clustering()
-
-
Method Details
-
reset
public static void reset()Resets the clustering state by clearing the dataset, labels, and model. -
addToDataset
Adds a sample to the dataset for clustering.- Parameters:
label
- The label of the sample.variants
- A map of feature positions and their corresponding values.
-
hasData
public static boolean hasData()Checks if the dataset contains any samples.- Returns:
true
if the dataset has at least one sample,false
otherwise.
-
train
public static void train()Trains the HDBSCAN model using the current dataset. -
getClusteringResult
Retrieves the clustering results as a list ofClustering.Entry
instances.- Returns:
- A list of clustering result entries, each containing the name, label, index, and outlier score.
-