(Part 1, Part 2)

The distributed algorithm for inference I described last time is kind of awkward, and not very well linked with the rest of the field. Olivier Peltre has a better one. It’s connected with the well-known message-passing algorithm for this sort of calculation. Unfortunately, while I think I have a decent idea of the general idea of what’s going on, I don’t have a very detailed understanding of all the moving pieces, so some of this exposition might be just a little bit wrong.

We will need to give up a bit of our nice simplicial structure to make everything work properly. We will instead work directly with a cover of our set of random variables $$\mathcal I$$ by a collection of sets $$\mathcal V$$, where we require that $$\mathcal V$$ be closed under intersection. This requirement makes $$\mathcal V$$ a partially ordered set under the relation $$\alpha \leq \beta$$ if $$\beta \subseteq \alpha$$. (The change in direction is to ensure that our sheaves stay sheaves and our cosheaves stay cosheaves.) The state spaces for each $$\alpha$$ still form a sheaf (i.e., a covariant functor) over $$\mathcal V$$; and so we also get the cosheaf $$\mathcal A$$ of observables and the sheaf $$\mathcal A^*$$ of functionals. (This is probably terrible terminology for anyone not comfortable with cellular sheaves, since typically we want to think of a presheaf as a contravariant functor. Sorry.)

We can move back to the simplicial world by taking the nerve $$\mathcal N(\mathcal V)$$ of $$\mathcal V$$: this is the simplicial complex whose $$k$$-simplices are nondegenerate chains of length $$k+1$$ in $$\mathcal V$$. In particular, vertices of $$\mathcal{N}(\mathcal V)$$ correspond to elements of $$\mathcal V$$, while edges correspond to pairs $$\alpha \leq \beta$$. A sheaf or cosheaf on $$\mathcal V$$ extends naturally to $$\mathcal{N}(\mathcal V)$$. The stalk over a simplex $$\alpha \leq \cdots \leq \beta$$ is $$\mathcal F(\beta)$$, and the restriction maps between 0- and 1-simplices are $$\mathcal F(\alpha) \to \mathcal F(\alpha \leq \beta)$$, $$x_\alpha \mapsto \mathcal F_{\alpha \leq \beta} x_\alpha$$, and $$\mathcal F(\beta) \to \mathcal F(\alpha \leq \beta)$$, $$x_\beta \mapsto x_\beta$$.

Extending a sheaf or cosheaf to $$\mathcal N(\mathcal V)$$ preserves homology and cohomology (defined in terms of derived functors and all that), so this is not an unreasonable thing to do. The interpretations of $$H_0(\mathcal N(\mathcal V);\mathcal A)$$ and $$H^0(\mathcal N(\mathcal V);\mathcal A^*)$$ are also the same. Homology classes of $$\mathcal A$$ correspond to Hamiltonians defined as a sum of local terms, and cohomology classes of $$\mathcal A^*$$ (when normalized) correspond to consistent local marginals (or pseudomarginals).

We now come to the part of the framework I have always found most confusing, because it’s seldom very well explained. Peltre (and others, probably) makes a distinction between the interaction potentials $$h_\alpha$$ which are summed to get a Hamiltonian on $$S$$, and the collection of local Hamiltonians $$H_\alpha$$, obtained by treating each subset $$\alpha$$ as if it were an isolated system and summing the potentials $$h_\beta$$ for $$\beta \subseteq \alpha$$. Both of these can be seen as 0-cochains of $$\mathcal A$$, and they are not really homologically related. There is an invertible linear map $$\zeta: C_0(\mathcal{N}(\mathcal V);\mathcal A) \to C_0(\mathcal{N}(\mathcal V);\mathcal A)$$ which sends a collection of interaction potentials to its family of local Hamiltonians. The formula is straightforward: $$(\zeta h)_\alpha = \sum_{\beta \subseteq \alpha} \mathcal A_{\alpha \leq \beta} h_\beta$$. The inverse of $$\zeta$$ is the Möbius transform $$\mu$$ given by $$(\mu H)_\alpha = \sum_{\beta \subseteq \alpha} c_{\beta} \mathcal{A}_{\alpha \leq \beta} H_\beta$$. The coefficients $$c_{\beta}$$ are the Möbius numbers of the poset $$\mathcal V$$. Notice that these formulas refer to the poset relations, not to the incidence of simplices in $$\mathcal N(\mathcal V)$$, which is a bit frustrating—the simplicial structure doesn’t seem to really be doing much here. However, $$\zeta$$ and $$\mu$$ can be extended to chain maps, and hence descend to homology. There are analogous operations on cochains of $$\mathcal A^*$$, adjoint to the operations on the chains of $$\mathcal A$$.

The action of $$H^0(\mathcal N(\mathcal V);\mathcal A^*)$$ on $$H_0(\mathcal N(\mathcal V);\mathcal A)$$ calculates the expected value of the Hamiltonian $$H$$ given by a class of interaction potentials $$h_\alpha$$ with respect to a consistent set of marginals. If a chain represents a collection of local Hamiltonians, then this pairing doesn’t compute an expectation. Instead, if $$p \in H^0$$ and $$H \in C_0$$, we have to take $$\langle p, \mu \cdot H\rangle$$ to get the expectation. By adjointness, this is $$\langle \mu\cdot p, H\rangle$$.

For reasons I don’t understand very deeply, these operations play a key role in formulating message passing dynamics. My current understanding is that it has to do with the gradients of the Bethe entropy. The Bethe entropy is defined on a section of $$\mathcal A^*$$ by the same inclusion-exclusion procedure as we did before, but using the Möbius coefficients to perform inclusion-exclusion: $$\check{S}(p) = \sum_{\alpha} c_\alpha S(p_\alpha)$$. Think of this as computing local (i.e. marginalized) entropy and then applying the Möbius transform.

The Bethe free energy is the functional we minimize to find the approximate pseudomarginals: $$\langle p, h \rangle - \check{S}(p)$$. Adding the constraint that $$p$$ be consistent ($$\delta p = 0$$) and normalized ($$\langle p_\alpha, \mathbf{1}_\alpha \rangle = 1$$), we get the Lagrangian condition

$h + \mu (\log p + \mathbf{1}) + \partial \lambda + \tau = 0$

which we can then convert to

$-\log p - \mathbf{1} = \zeta(h + \partial \lambda + \tau).$

The Lagrange multiplier $$\tau$$ is an arbitrary 0-chain which is constant on each vertex, so its zeta transform also satisfies the same condition, and we can roll the $$\mathbf 1$$ into it. $$\lambda$$ is an arbitrary 1-chain. Ultimately, we have the requirement that

$-\log p = \zeta(h + \partial \lambda) + \tau.$

This holds on the level of cochains, and of course the cochain $$p$$ must also be in $$H^0(\mathcal{N}(\mathcal V),\mathcal A^*)$$. So in order to be a critical point for the Bethe free energy functional, $$-\log p$$ must be obtained as the zeta transform of something homologous to the given local potential $$h$$, up to some normalizing additive constant. Solving the marginalization problem amounts to a search for a local potential $$h'$$ homologous to the given $$h$$ that makes $$p = \exp(-\zeta h')$$ a section of $$\mathcal A^*$$. Note that every section of $$\mathcal A^*$$ is normalized to some constant sum, so the normalization constraint isn’t all that interesting. Further, if $$p = \exp(-\zeta h')$$, it is automatically nonnegative.

One of the key insights now is that if we define a differential equation on $$h$$ where the derivative of $$h$$ is always in the image of $$\delta$$, the evolution of the state will be restricted to the homology class of the initial condition. If we define an operator on $$C_0(\mathcal{N}(\mathcal V);\mathcal A)$$ which vanishes when $$\exp(-\zeta h')$$ is a section of $$\mathcal A^*$$, we’ll be able to use it to implement just such a differential equation.

Let $$\Delta = \partial \circ \mathcal D \circ \zeta$$, where $$\mathcal D: C_0(\mathcal N(V);\mathcal A^*) \to C_1(\mathcal N(\mathcal V); \mathcal A)$$ is defined by $$(\mathcal D H)_{\alpha \leq \beta} = -\log(\mathcal A^*_{\alpha \leq \beta}\exp(-H_\alpha)) +H_\beta \in \mathcal A(\alpha \leq \beta) = \mathcal A(\beta)$$. This operator clearly vanishes when $$\exp(-\zeta h)$$ is a section of $$\mathcal A^*$$, since $$\mathcal D \circ \zeta$$ does. In fact, $$\Delta$$ vanishes at precisely these points. This operator $$\Delta$$ is somewhat mysterious, but if you squint hard enough it looks kind of like a Laplacian. In fact, its linearization around a given 0-chain appears to be a sheaf Laplacian.

Thus the system of differential equations $$\dot{h} = -\Delta h$$ has stationary points corresponding precisely to the critical points of the Bethe entropy. If you discretize this ODE with a naive Euler scheme with step size 1, you get a discrete-time evolution equation, and this implies dynamics on local marginal estimates $$p_\alpha \propto \exp(-(\zeta h)_\alpha)$$. It turns out that these dynamics on $$p$$ are precisely the standard message-passing dynamics, typically described in terms of marginalization, sums, and products. This is a really neat result. And despite the complications involved in defining everything, I find this more comprehensible than the other expositions of generalized message-passing algorithms I’ve read. (I’m looking at you, Wainwright and Jordan.) I’d still like to understand this better, though. What exactly is the relationship between the operator $$\Delta$$ and a sheaf Laplacian? Is there a sheaf-theoretic way to understand the max-product algorithm for finding maximum-likelihood elements rather than marginal distributions? Can we use (possibly conically constrained) cohomology of $$\mathcal A^*$$ to say anything about the pseudomarginal problem?