Distributed Recommender System



Many online services rely on recommender systems that provide users with personalized suggestions for products. A popular approach is collaborative filtering, where product ratings from many users are combined to learn a relation between users and products. Most of the current collaborative filtering algorithms are centralized, requiring a central server that stores the ratings of all the users. In this project, we examine algorithms that work with data that is distributed across nodes in a computer cluster or a peer-to-peer network.

A standard approach to collaborative filtering is to perform stochastic gradient descent for a latent variable model (matrix factorization, Restricted Boltzmann Machines). A challenge is that stochastic gradient does not parallelize naturally. We propose a simple algorithm that performs stochastic gradient descent updates in parallel on multiple models and periodically averages the model parameters. We demonstrate empirically that this algorithm can achieve near-linear speed-ups for up to 32 nodes.

An important problem in machine learning is to design a recommender system that operates on a peer-to-peer network. Such a system would be useful for a community of users who do not have a subscription to a commercial service, such as Netflix. Unfortunately, none of the existing work known to us address the problem of learning a latent variable model in this setting. We propose a two-level peer-to-peer architecture, where each super-peer serves only a fraction of users. We then devise a consensus-style algorithm that learns the model with minimal coordination.

We evaluate our algorithm on the Netflix dataset, demonstrating comparable results to a centralized learner.