16 Shape analysis and comparison

“The book of nature is written in the language of mathematics.”
— Galileo Galilei

How can we tell if two shapes are “similar”? Traditional approaches rely on specific geometric measurements — lengths, angles, volumes — which depend on the coordinate system and are sensitive to noise. Persistent homology offers a coordinate-free, noise-robust alternative: compare shapes by comparing their persistence diagrams.

In this chapter, we use Wasserstein distances between persistence diagrams to build a simple “shape space” — a metric space where each point is a shape, and the distance reflects topological similarity. The point is not to compare raw coordinates directly, but to compare the topological signatures extracted from the data.

16.1 Setup

using CairoMakie
using MetricSpaces
using MetricSpaces.Datasets
using PersistenceDiagrams
using Ripserer
using Random

16.2 Generating a shape library

Let’s create a collection of point clouds from shapes with known topology:

function sample_shape(shape, n, seed)
    rng = MersenneTwister(seed)
    noise_level = 0.05

    if shape == :circle
        return EuclideanSpace(as_matrix(sphere(n, dim = 2)) .+ noise_level .* randn(rng, 2, n))
    elseif shape == :sphere
        return EuclideanSpace(as_matrix(sphere(n, dim = 3)) .+ noise_level .* randn(rng, 3, n))
    elseif shape == :torus
        return EuclideanSpace(as_matrix(torus(n)) .+ noise_level .* randn(rng, 3, n))
    elseif shape == :two_circles
        c1 = as_matrix(sphere(n ÷ 2, dim = 2))
        c2 = as_matrix(sphere(n ÷ 2, dim = 2)) .+ [3.0, 0.0]
        return EuclideanSpace(hcat(c1, c2) .+ noise_level .* randn(rng, 2, n))
    elseif shape == :blob
        return EuclideanSpace(0.5 .* randn(rng, 3, n))
    else
        error("Unknown shape: $shape")
    end
end

sample_shape (generic function with 1 method)

shapes = [
    :circle, :circle, :circle,
    :sphere, :sphere, :sphere,
    :torus, :torus, :torus,
    :two_circles, :two_circles, :two_circles,
    :blob, :blob, :blob,
]

n_points = 100
clouds = [sample_shape(s, n_points, i) for (i, s) in enumerate(shapes)]
shape_names = string.(shapes)

15-element Vector{String}:
 "circle"
 "circle"
 "circle"
 "sphere"
 "sphere"
 "sphere"
 "torus"
 "torus"
 "torus"
 "two_circles"
 "two_circles"
 "two_circles"
 "blob"
 "blob"
 "blob"

16.3 Computing persistence diagrams for each shape

function compute_ph(X; dim_max = 1, threshold = 3.0)
    ripserer(Rips(X; threshold = threshold), dim_max = dim_max)
end

diagrams = [compute_ph(X) for X in clouds]

15-element Vector{Vector{PersistenceDiagram}}:
 [100-element 0-dimensional PersistenceDiagram, 2-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 4-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 5-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 26-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 27-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 19-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 23-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 24-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 31-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 2-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 5-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 6-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 31-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 26-element 1-dimensional PersistenceDiagram]
 [100-element 0-dimensional PersistenceDiagram, 32-element 1-dimensional PersistenceDiagram]

16.4 Building a distance matrix

Now we compare every pair of shapes using the Wasserstein distance on their \(H_1\) persistence diagrams:

n_shapes = length(diagrams)
dist_matrix = zeros(n_shapes, n_shapes)

for i in 1:n_shapes
    for j in (i + 1):n_shapes
        d1 = diagrams[i][2]
        d2 = diagrams[j][2]

        if !isempty(d1) && !isempty(d2)
            d = Wasserstein()(d1, d2)
            dist_matrix[i, j] = isfinite(d) ? d : 1.0
        elseif isempty(d1) && isempty(d2)
            dist_matrix[i, j] = 0.0
        else
            dist_matrix[i, j] = 1.0
        end

        dist_matrix[j, i] = dist_matrix[i, j]
    end
end

fig = Figure(size = (600, 500))
ax = Axis(
    fig[1, 1],
    title = "Pairwise Wasserstein distances (H₁)",
    xticks = (1:n_shapes, shape_names),
    yticks = (1:n_shapes, shape_names),
    xticklabelrotation = π / 4,
)
heatmap!(ax, dist_matrix)
Colorbar(fig[1, 2], limits = extrema(dist_matrix), label = "Distance")
fig

If everything works well, you should see blocks of small distances along the diagonal — repeated samples of the same shape produce similar persistence diagrams. Shapes with different topology, especially different numbers of loops, should be farther apart.

16.5 What the distances capture

Let’s think about what each shape’s persistence should look like:

Shape	\(\beta_0\)	\(\beta_1\)	Expected signature
Circle	1	1	One persistent loop
Sphere	1	0	No persistent loops (but a nontrivial \(H_2\) class)
Torus	1	2	Two persistent loops
Two circles	2 → 1	2	Two persistent loops (one per circle)
Blob	1	0	No persistent features

Using only \(H_1\), we expect:

circles to stand apart because they have one strong loop;
spheres and blobs to look similar, since both have trivial \(H_1\);
tori and two-circle clouds to look similar, since both have two prominent loops.

To distinguish sphere from blob, we would need to look at \(H_2\) as well, since the sphere has a cavity while the blob does not.

16.6 Persistence images as features

A natural workflow for classification tasks is:

Compute persistence diagrams for each sample.
Convert diagrams to a fixed-size vector representation such as persistence images, persistence landscapes, or Betti curves.
Feed these vectors into a standard classifier.

Here we use persistence images, which convert persistence diagrams into fixed-size arrays:

function safe_persistence_image(diagram; kwargs...)
    if isempty(diagram)
        return zeros(5, 5)
    else
        pi = PersistenceImage([diagram]; kwargs...)
        return pi(diagram)
    end
end

images = [safe_persistence_image(d[2]) for d in diagrams]

15-element Vector{Matrix{Float64}}:
 [0.04058744458795432 0.04082252252571269 … 0.04107608718073064 0.04109321729359087; 0.12266037691083896 0.12343082706293752 … 0.1243171647196308 0.12442831020068014; … ; 0.46247769609902506 0.46545058681001744 … 0.46892841428735055 0.4694147442172791; 0.552556584618048 0.556111975671683 … 0.5602741093896714 0.5608585840888307]
 [0.04381546898306087 0.04463565043637233 … 0.045434797740701555 0.04539828900679826; 0.12066964618675634 0.12337517705171064 … 0.1264716673948049 0.12680264719043197; … ; 0.4416391348709831 0.4520971415240001 … 0.4645447488365301 0.46629321931878287; 0.5269793469778803 0.5394873869214776 … 0.554398808339907 0.5565133307679517]
 [0.0641966774678188 0.07072695864449144 … 0.07608072059253804 0.07422310971122226; 0.13741999865225 0.1591316273066478 … 0.18676022045887347 0.18911076981977357; … ; 0.4533106570051892 0.5374113201389568 … 0.6546471209976816 0.6726143380734175; 0.5382529048171264 0.6388481148162881 … 0.7795902108366998 0.8015263464660536]
 [7.892992177405651 11.814998887611086 … 11.124617919947308 7.0376166657993; 9.666080223441716 14.398491156552573 … 13.275074565493306 8.270758908158735; … ; 9.332020378662348 13.59208401149628 … 11.683268638486833 6.952059619515062; 7.507940947635314 10.76833780682638 … 8.844507614640788 5.106805241530214]
 [5.280487997489339 8.77593343162882 … 10.209816879238392 7.287852524732449; 6.737626740619449 11.175278466880972 … 12.674057965816779 8.82984064918244; … ; 7.241529067090515 11.917961439230659 … 12.788776520560695 8.455735277876057; 6.043682048331124 9.909649733616064 … 10.390866809664939 6.7163065867910685]
 [5.860708824235528 9.063417946687476 … 10.830879485397048 8.345471311375682; 7.525833797874518 12.19926198470288 … 15.404345723181201 11.956905917892085; … ; 8.211091273203639 14.265574850565416 … 19.059229554458177 14.680898317598535; 6.801820199244569 11.923262788120091 … 15.824842368786838 12.026058641320208]
 [1.4244264835105167 2.423598425301987 … 3.3893437146776573 2.732107548907045; 1.7478772870894896 2.9756267101068206 … 4.080313390175748 3.2422745982141676; … ; 1.8107077243522105 2.994242343022865 … 3.7710377953047627 2.864452127543297; 1.5282365081617644 2.458467647172391 … 2.904535493558656 2.1388453468246924]
 [1.2750079609522833 2.2599160013243837 … 3.050214413196775 2.329215299801888; 1.5107494078374835 2.675689069006048 … 3.5553085084851523 2.677636595371067; … ; 1.5689233725136051 2.755320189245379 … 3.509779884326169 2.5598735503382604; 1.3735910007384313 2.394436073908691 … 2.9729509587323006 2.1302878558588576]
 [1.3911438767502005 2.0548719132909876 … 1.8021594294728447 1.0787048641367223; 1.5107721556461298 2.2265757984294203 … 1.9368832546240888 1.1524030848036073; … ; 1.544423271973301 2.2650630438934103 … 1.9373520669043847 1.1385560670561148; 1.4535671562130676 2.1262473000771953 … 1.8029700318318527 1.0529716167574796]
 [0.16794124051331935 0.1857299142610198 … 0.20411472114983484 0.2026417185179922; 0.38979867049185996 0.4229916925667718 … 0.44923587321068686 0.43933335522868333; … ; 0.9328083007481196 0.970718966319698 … 0.9492618262841799 0.892308935138864; 0.9610546801117728 0.9819244108775772 … 0.9229066308074905 0.8496613327901538]
 [0.09546207045415694 0.1043634233936872 … 0.07948540725579793 0.055377347266902716; 0.278716747967098 0.3051828416967628 … 0.2325424927022578 0.16180838161194783; … ; 0.9261709623334184 1.0107484668959095 … 0.7644708125354218 0.5297428247636771; 1.0407045416529568 1.133276459818977 … 0.8533877578652066 0.590050454344913]
 [0.10490836963891505 0.154081720009238 … 0.17757471739181635 0.13437601376914277; 0.22448775244887936 0.3590977200819848 … 0.44384994735198036 0.3373548863296763; … ; 0.6366047128372292 1.0158013400503967 … 1.1835334092778693 0.8657120821198472; 0.7055352059936777 1.1038102260925107 … 1.2243090554170633 0.8729041641256936]
 [22.230471846078377 30.267779647964304 … 25.294526171668494 15.666043657854237; 24.79254642766954 34.07788613954515 … 28.51219032267752 17.499336642560348; … ; 22.8265191051129 31.966553067344833 … 26.749190832291077 16.096129193708283; 18.91561766412127 26.71310772753298 … 22.320491215523333 13.292575405565294]
 [13.467207089040828 14.831995567036447 … 8.689215300964428 4.851478245303052; 14.015339417200648 15.396842619389158 … 8.964454314455518 4.987581062715113; … ; 14.279958569474282 15.609018840108211 … 8.975468814979648 4.957907022524903; 13.980274822123171 15.24336333303085 … 8.71055417358085 4.793944017957522]
 [16.229533632402593 19.614789111930147 … 12.808633727628528 7.128188476237318; 16.873134496223987 20.49332876437552 … 13.52764141391874 7.563266282625648; … ; 16.491202144868925 20.236869903166674 … 13.666276101085856 7.717601282302513; 15.503332982253923 19.126965470478204 … 13.072044598657849 7.42161361613929]

These matrices can then be flattened and fed to a standard machine-learning model. This lets us combine topological information with the usual classification toolbox without having to design custom classifiers from scratch.

16.7 Robustness to noise and sampling

One of the main strengths of persistent homology for shape comparison is its stability: small perturbations of the input point cloud should produce small perturbations of the persistence diagram, and therefore small changes in the distances.

Let’s verify that numerically:

noise_levels = [0.01, 0.05, 0.1, 0.2, 0.3]
base_circle = as_matrix(sphere(100, dim = 2))

circle_variants = [
    EuclideanSpace(base_circle .+ nl .* randn(MersenneTwister(i), 2, 100))
    for (i, nl) in enumerate(noise_levels)
]

diagrams_circle = [compute_ph(X) for X in circle_variants]

base_diagram = diagrams_circle[1][2]
for (d, nl) in zip(diagrams_circle, noise_levels)
    if !isempty(base_diagram) && !isempty(d[2])
        dist = Wasserstein()(base_diagram, d[2])
        println("Noise level $nl -> Wasserstein distance: $(round(dist, digits = 4))")
    end
end

Noise level 0.01 -> Wasserstein distance: 0.0
Noise level 0.05 -> Wasserstein distance: 0.1343
Noise level 0.1 -> Wasserstein distance: 0.2557
Noise level 0.2 -> Wasserstein distance: 0.873
Noise level 0.3 -> Wasserstein distance: 1.366

The distances should increase gradually as the noise grows, rather than jumping unpredictably. That is exactly the kind of robustness that makes persistent homology useful for comparing noisy geometric data.

16.8 Summary

In this chapter we learned:

persistence diagrams can serve as topological fingerprints of shapes;
Wasserstein distances turn those fingerprints into a metric on shapes;
persistence images provide fixed-size features for machine learning;
the whole procedure is stable under noise.

This same workflow applies to 3D scans, image contours, molecular structures, and many other kinds of geometric data.