# Learning Junction Trees

### Introduction

There have been many successful applications of *probabilistic graphical
models* recenty, but two major obstacles remain for many more potential
users: *exact inference is intractable* in many models, and there are
few methods to *construct the structure* of a good model automatically
*from data*. This project addresses both of these problems by learning
structure of
*thin* (small-treewidth) *junction trees*
in *polynomial time*. Inference complexity
in junction trees is exponential in treewidth, so by keeping treewidth small
we gurantee that the inference in the resulting model is tractable. Unlike
many existing local search-based approaches to structure learning, ours comes
with *global guarantees* on the quality of the resulting model in terms of
*Kullback-Leibler divergence* from the true underlying distribution.

### Efficiently estimating conditional mutual information

In many structure learning algorithms, testing for conditional independence,
or, more generally, estimating conditional mutual information
of sets of variables *A* and *B* given *C* is a key
step. Performed naively, this estimation requires time and number of
samples exponential in *|A| + |B| + |C|*. We show, that it is
possible to bound the conditional mutual information
*I(A, B | C)* from above in polynomial time if the probability
distribution over all variables *V* can be well approximated
by a thin junction tree.
In structure learning, for every potential separator *S* we use this
result to partition (in polynomial time) the set of all variables
*V* into weakly
conditionally dependent subsets. Then we combine the resuts of those
partitioning into a junction tree using dynamic programming.

### Theoretical guarantees

When the true distribution *P(V)* can be well approximated by a
junction tree of treewidth *k*, and that junction tree has
strong intra-clique dependencies, our algorithm is guaranteed to find a
model with KL divergence from the true
distribution at most *|V|kε*,
using polynomial time and number of samples.
*ε* here measures the quality of approximation of the
true distribution by a thin junction tree.

As a special case, we show that junction trees with strong intra-clique dependencies are efficiently PAC learnable.

### Empirical results

#### Model quality

We compare the quality of the models learned by our algorithm
(denoted *LPACJT*), ordering-based search of
Teyssier and Koller (UAI 2005) (denoted *OBS*), algorithm of
Karger and Srebro (SODA 2001), and Chow-Liu algorithm.
Our algorithm can be also augmented by local search (LPCAJT+Local).
We learn the models using training datatset and then measure
their on a separate test data.