Mapper and cosheaves

Mapper is perhaps the most widely applied tool from Topological Data Analysis, but to a first approximation it seems to be just barely topological. The fanciest topological notion it uses is that of the nerve of a cover. We wish to understand a base (data) space \(X\), and we have a continuous function \(f: X \to Z\), where \(Z\) is a space we understand well, like \(\mathbb{R}\). We choose a covering of \(Z\), and pull it back via \(f\), then split each preimage into its connected components, finally taking as our output the nerve of this new cover of \(X\). If each connected component is contractible, this nerve is homotopy equivalent to \(X\). Even if this condition does not hold, the nerve still preserves some of the topology of \(X\).

To apply Mapper to point cloud data, we do essentially the same thing, but instead of splitting each preimage into connected components, we use a clustering method to partition it. This method underlies Ayasdi’s data visualization platform.

Mapper is essentially a discretized version of the Reeb graph or the more general Reeb space. But until very recently it has not been clear in what rigorous sense this is true, or whether as the cover used in Mapper is refined, the output converges to the Reeb graph in some meaningful sense. Further, it’s not easy to directly compare the output of Mapper for different covers, data sets, or filter functions \(f\).

A pair of recent papers, though, give a deeper way to represent and understand the Reeb graph and Mapper. The key insight is that the operation of taking preimages under a map naturally produces a cosheaf. Even better, taking preimages of a cover produces a cosheaf in much the same way. Under mild tameness assumptions, these cosheaves are constructible, and we can use the interleaving distance defined in Sheaves, Cosheaves, and Applications to measure the distance between them. This allowed Munch and Wang to finally prove that the 1-D Mapper does in fact converge to the Reeb graph. (I’m not sure what technical point is that prevents them from proving this for arbitrarily many dimensions, but they say it should be easy to overcome.)

What is more interesting for data analysis is that it may be possible to compare the output of Mapper for different data sets and covers. The cosheaf construction gives us a simplicial complex that in some sense “projects down” onto the filter space \(Z\). So we could set up a sort of visualization that twists the simplicial complex around above the filter space, allowing us to see which simplices in the two output complexes are close to each other. If computing the interleaving distance for cosheaves were at all computationally feasible we could even quantify their similarity. Of course, it’s not clear whether there is any situation where it is useful to compare the Mapper complexes of two data sets (or clustering methods, or filter functions) in this way.