using Ripserer
using PersistenceDiagrams
import GeometricDatasets as gd
using Distances
using CairoMakie
using Random16 Shape analysis and comparison
“The book of nature is written in the language of mathematics.”
— Galileo Galilei
How can we tell if two shapes are “similar”? Traditional approaches rely on specific geometric measurements — lengths, angles, volumes — which depend on the coordinate system and are sensitive to noise. Persistent homology offers a coordinate-free, noise-robust alternative: compare shapes by comparing their persistence diagrams.
In this chapter, we use the Wasserstein and bottleneck distances between persistence diagrams to build a “shape space” — a metric space where each point is a shape, and the distance reflects topological similarity.
16.1 Setup
16.2 Generating a shape library
Let’s create a collection of point clouds from shapes with known topology:
function sample_shape(shape, n, seed)
rng = MersenneTwister(seed)
noise_level = 0.05
if shape == :circle
X = gd.sphere(n, dim = 2) .+ noise_level .* randn(rng, 2, n)
elseif shape == :sphere
X = gd.sphere(n, dim = 3) .+ noise_level .* randn(rng, 3, n)
elseif shape == :torus
X = gd.torus(n) .+ noise_level .* randn(rng, 3, n)
elseif shape == :two_circles
c1 = gd.sphere(n ÷ 2, dim = 2)
c2 = gd.sphere(n ÷ 2, dim = 2) .+ [3.0; 0.0]
X = hcat(c1, c2) .+ noise_level .* randn(rng, 2, n)
elseif shape == :blob
X = randn(rng, 3, n) .* 0.5
else
error("Unknown shape: $shape")
end
return X
end# Sample multiple instances of each shape
shapes = [:circle, :circle, :circle,
:sphere, :sphere, :sphere,
:torus, :torus, :torus,
:two_circles, :two_circles, :two_circles,
:blob, :blob, :blob]
n_points = 100
clouds = [sample_shape(s, n_points, i) for (i, s) in enumerate(shapes)]
shape_names = string.(shapes)16.3 Computing persistence diagrams for each shape
function compute_ph(X; dim_max = 1, threshold = 3.0)
pts = [Tuple(X[:, i]) for i in 1:size(X, 2)]
return ripserer(Rips(pts), dim_max = dim_max, threshold = threshold)
end
diagrams = [compute_ph(X) for X in clouds]16.4 Building a distance matrix
Now we compare every pair of shapes using the Wasserstein distance on their \(H_1\) persistence diagrams:
n_shapes = length(diagrams)
dist_matrix = zeros(n_shapes, n_shapes)
for i in 1:n_shapes
for j in (i+1):n_shapes
d1 = diagrams[i][2] # H₁ diagram
d2 = diagrams[j][2] # H₁ diagram
if length(d1) > 0 && length(d2) > 0
dist_matrix[i, j] = Wasserstein()(d1, d2)
elseif length(d1) == 0 && length(d2) == 0
dist_matrix[i, j] = 0.0
else
# One has features, the other doesn't — use a penalty
dist_matrix[i, j] = 1.0
end
dist_matrix[j, i] = dist_matrix[i, j]
end
endfig = Figure(size = (600, 500))
ax = Axis(fig[1, 1], title = "Pairwise Wasserstein distances (H₁)",
xticks = (1:n_shapes, shape_names),
yticks = (1:n_shapes, shape_names),
xticklabelrotation = π/4)
heatmap!(ax, dist_matrix)
Colorbar(fig[1, 2], limits = extrema(dist_matrix), label = "Distance")
figIf everything works well, you should see blocks of small distances along the diagonal — shapes of the same type have similar persistence diagrams. Shapes with different topology (different number of loops) will have large distances.
16.5 What the distances capture
Let’s think about what each shape’s persistence looks like:
| Shape | \(\beta_0\) | \(\beta_1\) | Expected signature |
|---|---|---|---|
| Circle | 1 | 1 | One persistent loop |
| Sphere | 1 | 0 | No persistent loops (but \(\beta_2 = 1\)) |
| Torus | 1 | 2 | Two persistent loops |
| Two circles | 2 → 1 | 2 | Two persistent loops (one per circle) |
| Blob | 1 | 0 | No persistent features |
The Wasserstein distance on \(H_1\) should cluster:
- Circle, sphere, and blob together (0 or 1 loop)
- Torus and two_circles together (2 loops)
To distinguish sphere from blob (which both have \(\beta_1 = 0\)), we would need to look at \(H_2\) (where the sphere has a cavity but the blob doesn’t).
16.6 Using persistence as features
A natural workflow for classification tasks:
- Compute persistence diagrams for each sample
- Convert diagrams to a fixed-size vector representation (persistence images, persistence landscapes, or Betti curves)
- Feed these vectors into a standard classifier
# Compute persistence images for each shape
# Using H₁ diagrams
function safe_persistence_image(diagram; kwargs...)
if length(diagram) > 0
return PersistenceImage(diagram; kwargs...)
else
# Return a zero image if no features
return zeros(5, 5)
end
end
images = [safe_persistence_image(d[2]) for d in diagrams]These feature vectors can then be used with any machine learning algorithm — random forests, SVMs, neural networks — to classify shapes by their topology. We saw a full example of this pipeline in the digits classification chapter.
16.7 Robustness to noise and sampling
One of the great strengths of persistent homology for shape comparison is its stability: small changes in the point cloud lead to small changes in the persistence diagram (and therefore small changes in the distances).
Let’s verify this:
# Same circle with different noise levels
noise_levels = [0.01, 0.05, 0.1, 0.2, 0.3]
circle_variants = [gd.sphere(100, dim = 2) .+ nl .* randn(MersenneTwister(42), 2, 100)
for nl in noise_levels]
diagrams_circle = [compute_ph(X) for X in circle_variants]
# Compare each to the first (least noisy)
base_diagram = diagrams_circle[1][2]
for (i, (d, nl)) in enumerate(zip(diagrams_circle, noise_levels))
if length(base_diagram) > 0 && length(d[2]) > 0
dist = Wasserstein()(base_diagram, d[2])
println("Noise level $nl → Wasserstein distance: $(round(dist, digits=4))")
end
endThe distances should increase gradually with noise — no sudden jumps. This is the stability theorem in action.
16.8 Summary
In this chapter we learned:
- Persistence diagrams provide a coordinate-free summary of shape
- The Wasserstein distance between diagrams serves as a shape metric
- Shapes with different topology (different Betti numbers) are well-separated
- Persistence images convert diagrams to vectors for machine learning
- The approach is stable — robust to noise and sampling variation
This framework applies to any data that can be represented as a point cloud: 3D scans, molecular structures, image features, sensor networks, and more.