Learning Junction Trees

Introduction

There have been many successful applications of probabilistic graphical models recenty, but two major obstacles remain for many more potential users: exact inference is intractable in many models, and there are few methods to construct the structure of a good model automatically from data. This project addresses both of these problems by learning structure of thin (small-treewidth) junction trees in polynomial time. Inference complexity in junction trees is exponential in treewidth, so by keeping treewidth small we gurantee that the inference in the resulting model is tractable. Unlike many existing local search-based approaches to structure learning, ours comes with global guarantees on the quality of the resulting model in terms of Kullback-Leibler divergence from the true underlying distribution.

Efficiently estimating conditional mutual information

In many structure learning algorithms, testing for conditional independence, or, more generally, estimating conditional mutual information of sets of variables A and B given C is a key step. Performed naively, this estimation requires time and number of samples exponential in |A| + |B| + |C|. We show, that it is possible to bound the conditional mutual information I(A, B | C) from above in polynomial time if the probability distribution over all variables V can be well approximated by a thin junction tree. In structure learning, for every potential separator S we use this result to partition (in polynomial time) the set of all variables V into weakly conditionally dependent subsets. Then we combine the resuts of those partitioning into a junction tree using dynamic programming.

Theoretical guarantees

When the true distribution P(V) can be well approximated by a junction tree of treewidth k, and that junction tree has strong intra-clique dependencies, our algorithm is guaranteed to find a model with KL divergence from the true distribution at most |V|kε, using polynomial time and number of samples. ε here measures the quality of approximation of the true distribution by a thin junction tree.

As a special case, we show that junction trees with strong intra-clique dependencies are efficiently PAC learnable.

Empirical results

Model quality

We compare the quality of the models learned by our algorithm (denoted LPACJT), ordering-based search of Teyssier and Koller (UAI 2005) (denoted OBS), algorithm of Karger and Srebro (SODA 2001), and Chow-Liu algorithm. Our algorithm can be also augmented by local search (LPCAJT+Local). We learn the models using training datatset and then measure their on a separate test data.

ALARM test likelihoods ALARM dataset: artificial data, sampled from ALARM Bayesian network (Beinlich et al, European Conference on AI in Medicine, 1988). 37 variables. Treewidth of the true network is 4. The learned models are treewidth 3, except Chow-Liu (treewidth 1).
TEMPERATURE test likelihoods TEMPERATURE dataset: real-world data from a 2-month deployment of 54 temperature sensors in an Intel Berkeley lab (Deshpande et al, VLDB 2004). The learned models are treewidth 2, except Chow-Liu (treewidth 1).
TRAFFIC test likelihoods TRAFFIC dataset: real-world data from a 1 month of traffic flow measurements in 34 locations in San Francisco Bay area (Krause and Guestrin, UAI 2005). The learned models are treewidth 3, except Chow-Liu (treewidth 1).

Examples of learned structures

In these examples, nodes are variables, and there is an edge between two variables if they are in the same clique in the resulting junction tree.
ALARM example structure ALARM dataset. Green edges belong to both true and learned models, blue edges belong only to the learned model, red - only to the true one.
ALARM example structure TEMPERATURE dataset. Nodes are in the same places on the floor plan as their corresponding sensors.