12  Mapper: the general case

“The real voyage of discovery consists not in seeking new landscapes, but in having new eyes.”
— Marcel Proust, in “In Search of Lost Time”

We have already seen two specific mapper-like constructions:

The local TDAmapper.jl API makes the general pattern explicit: a mapper algorithm is a combination of

  1. a cover of the data;
  2. a refinement rule for each cover element;
  3. a nerve construction that turns the refined cover into a graph.

The beauty of this decomposition is that each step can be changed independently. That gives us a whole family of Mapper-like algorithms rather than a single fixed recipe.

12.1 The three building blocks

12.1.1 Covering step

Examples of covers include:

  • overlapping intervals in the image of a filter function;
  • fixed-radius balls around landmarks;
  • adaptive covers whose radius depends on local density.

In classical Mapper, the cover lives in the image of a filter function. In Ball Mapper, the cover lives directly in the metric space through balls around landmarks.

12.1.2 Refinement step

Once we have a raw cover, we may refine each element further:

  • DBscan clusters each cover element into connected dense pieces;
  • Trivial() keeps each cover element as a single node.

This is where “connectedness” enters the discrete picture: in the classical Reeb-graph story we would take connected components, but in data analysis we often replace that step by a clustering method.

12.1.3 Nerve step

Finally, we build a graph from the refined cover:

  • SimpleNerve() adds an edge whenever two cover elements overlap;
  • more restrictive nerve constructions can require larger intersections or weighted overlaps.

So the full algorithm is: cover, refine, then take a nerve.

12.2 Creating a mapper directly

The generic mapper function takes one object for each of those stages:

using MetricSpaces
using MetricSpaces.Datasets
using TDAmapper
using TDAmapper.DomainCovers
using TDAmapper.Refiners
using TDAmapper.Nerves
X_demo = annulus(300)
L_demo = farthest_points_sample_ids(X_demo, 30)

M_demo = mapper(
    X_demo,
    EpsilonBall(X = X_demo, L = L_demo, epsilon = 0.2),
    Trivial(),
    SimpleNerve(),
)

M_demo
Mapper graph with 30 vertices and 29 edges

That is exactly the pattern Ball Mapper uses under the hood. Writing it this way makes it easier to see what can be customized and why.

12.3 Interpreting Mapper graphs

Whatever choices we make, the output is still a graph, and we need to know how to read it:

  • nodes represent local groups of data points;
  • edges connect groups that overlap;
  • connected components suggest distinct macroscopic pieces of the data;
  • loops suggest circular or toroidal structure;
  • branches suggest flares, bifurcations, or tree-like behavior.

Coloring the nodes by a statistic of the underlying data often adds another layer of meaning to the graph.

12.4 Example: a noisy figure-eight

using CairoMakie
using TDAplots
X = add_noise(figure_eight(400; r = 1.0), 0.08)
metricspace_plot(X, markersize = 4)

We now build two mapper graphs using the same landmarks but different radii:

L = farthest_points_sample_ids(X, 60)

M_small = mapper(
    X,
    EpsilonBall(X = X, L = L, epsilon = 0.25),
    Trivial(),
    SimpleNerve(),
)

mapper_plot(M_small, node_values = node_colors(M_small, first.(X)))
M_large = mapper(
    X,
    EpsilonBall(X = X, L = L, epsilon = 0.4),
    Trivial(),
    SimpleNerve(),
)

mapper_plot(M_large, node_values = node_colors(M_large, first.(X)))

With the smaller radius, the two loops of the figure-eight remain visible. With the larger radius, the graph becomes much coarser and starts to collapse the structure.

12.5 Choosing parameters

There is no universal recipe, but a few heuristics help:

  • Coverage: every point should belong to at least one cover element.
  • Overlap: if cover elements barely overlap, the graph becomes disconnected.
  • Resolution: fine covers reveal more detail but also more noise.
  • Stability: features that survive across a range of parameters are more trustworthy.

That last point is the mapper version of the persistence philosophy: do not trust a single parameter choice more than the structure that persists across several of them.