5 Clustering

Clusterings are very useful in data analysis to group data into subsets such that points within each cluster are more “alike” (in some sense) than points in different clusters. Let’s formalise it.

Definition 5.1 A clustering of a set \(X\) is a collection \(C_1, \ldots, C_n \subset X\) such that \(\cup C_i = X\) and \(C_i \cap C_j = \emptyset\), for any \(i, j\). In other words: it is a disjoint covering of \(X\). Each \(C_i\) is called a cluster of \(X\).

A clustering method is a function \(f: X \to \mathbb{N}\) that associates each point \(x \in X\) with its clusters \(f(x)\).

There are several famous clustering methods such as [citar]. We will focus on the ToMATo algorithm, which uses tools from topology.

5.1 Datasets

The datasets are taken from (Ultsch and Lötsch 2020)