8  Computing persistent homology in Julia

“The purpose of computing is insight, not numbers.”
— Richard Hamming

In the previous chapter we learned what persistent homology is and why it matters. Now it is time to actually compute it. In the local JuliaTDA stack:

8.1 Setup

using CairoMakie
using MetricSpaces
using MetricSpaces.Datasets
using PersistenceDiagrams
using Ripserer
using TDAplots
using Random

8.2 Computing Rips persistence

Let us start with a simple example: a noisy circle.

circle = add_noise(sphere(100, dim = 2), 0.05)
circle_mat = as_matrix(circle)

fig = Figure()
ax = Axis(fig[1, 1], title = "Noisy circle (100 points)", aspect = DataAspect())
scatter!(ax, circle_mat[1, :], circle_mat[2, :], markersize = 6)
fig

We expect this point cloud to have:

  • \(\beta_0 = 1\) (one connected component);
  • \(\beta_1 = 1\) (one loop, namely the circle);
  • \(\beta_k = 0\) for \(k \geq 2\).

Let us compute persistent homology using the Vietoris-Rips filtration. Ripserer.jl can work directly with MetricSpaces objects, so there is no need to manually convert the cloud to tuples or row vectors.

result = ripserer(Rips(circle), dim_max = 1)
2-element Vector{PersistenceDiagram}:
 100-element 0-dimensional PersistenceDiagram
 2-element 1-dimensional PersistenceDiagram

The result contains one persistence diagram per homology dimension:

println("H₀:")
println(result[1])

println("\nH₁:")
println(result[2])
H₀:
100-element 0-dimensional PersistenceDiagram

H₁:
2-element 1-dimensional PersistenceDiagram

8.3 Visualizing persistence diagrams

persistence_plot(result)

In the \(H_1\) diagram, you should see one point far from the diagonal: that is the genuine loop of the circle. Points near the diagonal are short-lived features created by noise. Their persistence, meaning their distance from the diagonal, measures how significant they are.

8.4 Barcodes

barcode_plot(result)

The barcode gives a complementary view: each horizontal bar represents one topological feature, from its birth to its death. Long bars are significant features; short bars are usually noise. Barcodes and persistence diagrams carry the same information, but different people find one or the other easier to read.

8.5 Cubical complexes

For gridded data such as images, Ripserer.jl can work with cubical complexes, which are often much more efficient than Rips complexes on the same information:

n = 50
x = range(-2, 2, length = n)
y = range(-2, 2, length = n)
f = [exp(-(xi^2 + yi^2)) + 0.7 * exp(-((xi - 1)^2 + (yi - 1)^2)) for xi in x, yi in y]

result_cub = ripserer(Cubical(f))
2-element Vector{PersistenceDiagram}:
 4-element 0-dimensional PersistenceDiagram
 1-element 1-dimensional PersistenceDiagram
fig = Figure(size = (800, 400))
ax1 = Axis(fig[1, 1], title = "Function on a grid", aspect = DataAspect())
heatmap!(ax1, x, y, f)
fig
persistence_plot(result_cub)

This is the same persistence idea as before, but now the filtration comes from sublevel sets of a function on a grid rather than from growing metric balls around a point cloud.

8.6 Persistence images

Persistence diagrams are multisets of points, which is awkward for machine learning because most models expect fixed-size vectors. Persistence images solve this by placing smooth kernels around the diagram points and evaluating the result on a grid.

if !isempty(result[2])
    pi = PersistenceImage([result[2]])
    img = pi(result[2])

    fig = Figure()
    ax = Axis(fig[1, 1], title = "Persistence image", aspect = DataAspect())
    heatmap!(ax, img)
    fig
end

The key detail is that PersistenceImage learns its grid from a collection of diagrams. Even with a single diagram, we pass [result[2]]. In larger applications, such as the digits chapter, this is how persistence diagrams become feature vectors for standard classifiers and neural networks.

8.7 Wasserstein and bottleneck distances

To compare two persistence diagrams, we need a notion of distance. The two standard choices are:

Definition 8.1 The bottleneck distance between two persistence diagrams \(D_1\) and \(D_2\) is

\[ d_B(D_1, D_2) = \inf_\gamma \sup_{x \in D_1} \|x - \gamma(x)\|_\infty, \]

where \(\gamma\) ranges over all bijections between the diagrams, allowing points to be matched to the diagonal.

Definition 8.2 The \(p\)-Wasserstein distance is

\[ W_p(D_1, D_2) = \left(\inf_\gamma \sum_{x \in D_1} \|x - \gamma(x)\|_\infty^p \right)^{1/p}. \]

The bottleneck distance is controlled by the single worst match; the Wasserstein distance accumulates all matches. Both are available in PersistenceDiagrams.jl:

circle1 = add_noise(sphere(80, dim = 2), 0.05)
circle2 = add_noise(sphere(80, dim = 2), 0.10)

res1 = ripserer(Rips(circle1), dim_max = 1)
res2 = ripserer(Rips(circle2), dim_max = 1)

if !isempty(res1[2]) && !isempty(res2[2])
    d_bottle = Bottleneck()(res1[2], res2[2])
    d_wass = Wasserstein()(res1[2], res2[2])
    println("Bottleneck distance: ", round(d_bottle, digits = 4))
    println("Wasserstein distance: ", round(d_wass, digits = 4))
end
Bottleneck distance: 0.1101
Wasserstein distance: 0.1606

8.8 The full pipeline

Let us finish with a noisy torus:

torus_cloud = add_noise(torus(200), 0.05)
result_torus = ripserer(Rips(torus_cloud; threshold = 3.0), dim_max = 2)

barcode_plot(result_torus)

A torus has \(\beta_0 = 1\), \(\beta_1 = 2\), and \(\beta_2 = 1\). In the barcode, look for one long bar in \(H_0\), two long bars in \(H_1\), and one long bar in \(H_2\).

8.9 Summary

In this chapter we learned to:

  • compute Vietoris-Rips persistence with Ripserer.jl;
  • compute cubical persistence on gridded data;
  • visualize diagrams and barcodes with TDAplots.jl;
  • turn diagrams into persistence images;
  • compare diagrams with bottleneck and Wasserstein distances.

These tools form the computational backbone of TDA in Julia. In the next chapters, we will use them inside larger workflows for clustering, visualization, and applications.