Learning Junction Trees
There have been many successful applications of probabilistic graphical models recenty, but two major obstacles remain for many more potential users: exact inference is intractable in many models, and there are few methods to construct the structure of a good model automatically from data. This project addresses both of these problems by learning structure of thin (small-treewidth) junction trees in polynomial time. Inference complexity in junction trees is exponential in treewidth, so by keeping treewidth small we gurantee that the inference in the resulting model is tractable. Unlike many existing local search-based approaches to structure learning, ours comes with global guarantees on the quality of the resulting model in terms of Kullback-Leibler divergence from the true underlying distribution.
Efficiently estimating conditional mutual information
In many structure learning algorithms, testing for conditional independence, or, more generally, estimating conditional mutual information of sets of variables A and B given C is a key step. Performed naively, this estimation requires time and number of samples exponential in |A| + |B| + |C|. We show, that it is possible to bound the conditional mutual information I(A, B | C) from above in polynomial time if the probability distribution over all variables V can be well approximated by a thin junction tree. In structure learning, for every potential separator S we use this result to partition (in polynomial time) the set of all variables V into weakly conditionally dependent subsets. Then we combine the resuts of those partitioning into a junction tree using dynamic programming.
When the true distribution P(V) can be well approximated by a junction tree of treewidth k, and that junction tree has strong intra-clique dependencies, our algorithm is guaranteed to find a model with KL divergence from the true distribution at most |V|kε, using polynomial time and number of samples. ε here measures the quality of approximation of the true distribution by a thin junction tree.
As a special case, we show that junction trees with strong intra-clique dependencies are efficiently PAC learnable.
We compare the quality of the models learned by our algorithm (denoted LPACJT), ordering-based search of Teyssier and Koller (UAI 2005) (denoted OBS), algorithm of Karger and Srebro (SODA 2001), and Chow-Liu algorithm. Our algorithm can be also augmented by local search (LPCAJT+Local). We learn the models using training datatset and then measure their on a separate test data.