using TDAmapper;
import GeometricDatasets as gd;
7 (Classical) mapper
7.1 Reeb graph
In topology, there are many ways by which we try to see what can’t be seen, in particular high-dimensional sets. The Reeb graph is one of those ways: given a topological space \(X\) and a map \(f: X \to \mathbb{R}\), we can collapse the connected components of its pre-images to get a graph that reflects the level-sets of \(f\).
More formally, we define a relation \(\sim\) on \(X\) such that \(p \sim q\) if-and-only-if \(p\) and \(q\) belong to the same connected component of \(f^{-1}(c)\) for some \(c \in \mathbb{R}\).
7.1.1 The (classical) mapper
The (classical) mapper is an algorithm to create graphs from metric spaces, and can be seen as an “statistical” version of the Reeb graph.
To be able to mimick the Reeb graph, we need to change some objects from the continuous setting to the discrete setting:
- \(X = (X, d)\) is now a finite metric space, also called a point cloud;
- \(f: X \to \mathbb{R}\) can be any function (since \(X\) is discrete, \(f\) is always continuous);
- instead of inverse images of points of \(\mathbb{R}\), we calculate inverse images of subsets of \(\mathbb{R}\) (usually intervals);
- instead of connected components (which are trivial in the discrete setting), we use some clustering algorithm (DBSCAN, single linkage, etc.) and consider these clusterings as “connected pieces of \(X\)”.
The mapper graph can shed light to the geometry of \(X\):
- nodes are clusters of points of \(X\);
- the color of the nodes can summarise some information about the points of \(X\) that represent this node;
- edges denote some proximity (in the metric of \(d\) of \(X\)) between the nodes.
To be more precise, to calculate the mapper of a metric space \(X\), we need the following ingredients:
- a function \(f: X \to \mathbb{R}\) that measures something interesting in \(X\), as, for example, the excentricity, the first coordinate of PCA, and so on;
- a covering \(C\) of the image \(f(X) \subset \mathbb{R}\);
- a method \(l\) to cluster each \(f^{-1}(c)\) for \(c \in C\).
When all of this is chosen, we have a covering of \(X\) by clustering each pre-image of the elements of \(C\), that is:
\[ V = \{ l(p); \; p = f^{-1}(c) \; \text{for} \; c \in C\} \]
We then calculate the 1-dimensional nerve of \(V\): we define the set of edges \(E \subset V \times V\) by
\[ (v_1, v_2) \in E \leftrightarrow v_1 \cap v_2 \neq \emptyset \]
In words, we have an edge between \(v_1\) and \(v_2\) if there is some point in both \(v_1\) and \(v_2\) at the same time.
7.2 An example in Julia
Let’s import some packages:
and define \(X\) as a torus with the usual Euclidean distance
= gd.torus(2000) X
3×2000 Matrix{Float64}:
3.31687 3.01965 2.86135 -1.18996 … 2.52688 3.98365 0.271879
1.99244 2.03354 -1.65949 -2.14445 0.570201 -0.360719 2.33906
0.494297 0.76792 0.951466 0.836795 0.912271 0.00970037 0.764024
Important: when using TDAmapper, your point cloud must be in column-major order. That is: each point of \(X\) must be a column of X
, not a row (as is usual with dataframes). This is so because Distances.jl, NearestNeighbors.jl, Clustering.jl and many other packages for calculations with metric spaces use the column-major order for performance reasons.
We define the function \(f: X \to \mathbb{R}\) as the projection on the \(x\)-axis because our torus is laying down compared to the one in the Reeb graph example.
Let fv
be a vector such that fv[i]
is the \(x\)-axis projection of the point \(x_i\) of \(X\):
= X[1, :]; fv
You can plot \(X\) colored by \(f\) as follows:
using CairoMakie;
scatter(X[1, :], X[2, :], X[3, :], color = fv)
Important: the plots will be interactive when running in Julia if you change CairoMakie
to GLMakie
. Give it a try!
Define the covering intervals cv
as follows:
= uniform(fv, overlap = 150); C
You can check the first five intervals of this covering:
1:5] C[
5-element Vector{Interval}:
Interval(-4.7024693f0, -3.7038724f0)
Interval(-4.1318426f0, -3.1332457f0)
Interval(-3.5612159f0, -2.562619f0)
Interval(-2.9905891f0, -1.9919922f0)
Interval(-2.4199622f0, -1.4213654f0)
For the clustering algorithm we choose the DBSCAN with radius 1:
= cluster_dbscan(radius = 1); clustering
Then the mapper graph of \(X\) can be calculated by
# the mapper function needs:
# X
# fv, the values of f(X)
# the covering C
# the clustering function
= mapper(X, fv, C; clustering = clustering) mp
And plotted with
# define the value of each node as the maximum of
# values of fv
= node_colors(mp, fv)
node_values
mapper_plot(mp, node_values = node_values)
Compare it with the Reeb graph from the start. If this isn’t nice, what is?
https://commons.wikimedia.org/wiki/File:3D-Leveltorus-Reebgraph.png↩︎