Enriching Information Retrieval
SIGIR 2011 Workshop
July 28, 2011
Motivation and Goals
Most information retrieval systems and tasks are now embedded in a rich
context. Documents no longer exist on their own; they are connected to
other documents, they are associated with users and their position in a
social network, and they can be mapped onto a variety of ontologies.
Similarly, retrieval tasks have become more interactive and are solidly
embedded in a user's geospatial, social, and historical context.
We conjecture that new breakthroughs in information retrieval will not come from smarter
algorithms that better exploit existing information sources, but from new
retrieval algorithms that can intelligently use and combine new sources
of contextual metadata.
The goal of the Enriching Information Retrieval workshop is to explore how
new and emerging sources of contextual metadata can be used for improving
information retrieval, including ranking, personalization, diversification,
and faceted search. In particular, we aim to focus on three themes:
- The identification of novel types and sources of contextual metadata (e.g.,
new ontologies, usage patterns, locality information,
readability, temporal information);
- The automatic acquisition and distillation of metadata (e.g., via learning or through implicit data)
- The design of methods for exploiting new metadata sources in IR tasks.
A special focus of the workshop will be on metadata and retrieval tasks associated with social networks.
The full Call for Papers can be found here.
Mining Social Network Activity to Understand
and Predict User Behavior
The recent popularity of online social networks has increased the
amount of information available about users' behavior--including
current activities and interactions among friends and family. This
rich relational information can be used to predict user interests
and preferences even when individual data is sparse, as birds of a
feather do indeed flock together. Although these data offer several
opportunities to improve search and retrieval, the characteristics
of online social network data also present a number of challenges to
accurately incorporate the information into retrieval algorithms. In
this talk, I will describe our recently developed methods for (1)
predicting relationship strength, (2) combining information from
multiple social interaction networks, and (3) identifying sources of
user correlation. While giving an overview of these methods, I will
discuss their potential connections to retrieval tasks and highlight
algorithmic and evaluation challenges that may arise from modeling
the complex network dependencies.
Jennifer Neville is an assistant professor at Purdue University with a
joint appointment in the Departments of Computer Science and
Statistics. She received her PhD from the University of
Massachusetts Amherst in 2006. She received a DARPA IPTO Young
Investigator Award in 2003 and was selected as a member of the DARPA
Computer Science Study Group in 2007. In 2008, she was chosen by
IEEE as one of "AI's 10 to watch." Her research focuses on
developing data mining and machine learning techniques for
relational domains, including citation analysis, fraud detection,
and social network analysis.
Evaluating Rich Models in Context
Contextual information can allow information retrieval systems to improve ranking quality substantially. The richer the model, the more likely it is to provide actionable information. I will show how users' historical interests can provide context to improve information retrieval, while preserving user privacy. Further, I will show how geographical context provides even more opportunities in information retrieval.
A second key question in context sensitive retrieval is that of
evaluation: Traditional judgment based evaluation is not ideally
suited to assessing the extent to which context based systems
improve the information retrieval experience of users in
practice. In presenting the above approaches, I will also detail an
effective online evaluation approach for measuring improvements in
Filip Radlinski is a researcher at Microsoft and a contributor to Bing, where he works on machine learning approaches to information retrieval. He completed his dissertation on learning to rank from implicit feedback at Cornell University in 2008. His recent research focuses on learning personalized rankings, measuring ambiguity in queries and user intents, and studying how to assess the quality of ranked lists of documents from the user perspective by using click information.
A user-centered view of interactive information retrieval
expands the context relevant to understanding information
seeking intent to properties of the user and of their task.
Some user properties can be sources of metadata that may
be useful for improving information search interactions in
context. In particular, certain user physiological data can
provide novel metadata. One such source for metadata is
eye movement patterns during text search session interactions.
We have derived measures of lexical processing and
reading patterns from eye tracking logs of search sessions
in user studies and related them to the type of task, the
type of page, and the task difficulty as experienced by a
user. aOther physiological measurements may be sources of
contextually indicative search metadata, e.g. brain electrical
activity detected using EEG headsets. While such metadata
can only be obtained in research settings today, it is
not unreasonable to think it can become widely available in
the foreseeable future.
The ability of a user to understand a document would seem
to be a critical aspect of that document's relevance, and
yet a document's reading difficulty is a factor that has typically
been ignored in information retrieval systems. In this
position paper we advocate for incorporating estimates of
reading proficiency of users, and reading difficulty of documents,
into retrieval models, representations for learning
algorithms, and large-scale analyses of information retrieval
systems and users, particularly for Web search. We describe
key research problems such as estimating user proficiency,
estimating document difficulty, and re-ranking, and summarize
some potential future extensions that could exploit
this new type of meta-data.
Predicting the future has always been one of the main aims
of human beings in order to adapt their behavior and maximize
their chances of success. With the advent of the Web,
which indexes a wealth of temporal information, a great
number of research have been proposed in the area of Temporal
Information Retrieval, but Future Retrieval has remained
a difficult problem to handle. In this paper, we propose
to understand what the future is about. In particular,
we present an exploratory study to understand how the temporal
features impact upon the classification and clustering
of different "genres" of future-related texts.
We propose a new ranking model for personalized local search.
While local search verticals such as Google Local and Yahoo!
Local incorporate physical proximity and public sentiment
(reviews and ratings), their rankings reflect minimal personalization.
We personalize local search by integrating Twitter
social network structure and content analysis. Specifically,
we infer sentiment for tweets by the user and those he follows
which mention local businesses by name. We also provide
a Google Android tailored interface and interaction experience
for local search with Twitter integration. Evaluation
of search accuracy and quality of user experience via a 25
person user study shows both improved search accuracy and
anecdotal evidence of greater user satisfaction.
Posters will be presented on a stand, with a backing board provided of size 90cm x 120cm, which will accommodate up to A0 format size.
You can find a photo of the poster setup posted here.
- Eytan Adar, University of Michigan
- Lars Backstrom, Facebook
- Ben Carterette, University of Delaware
- Key-Sun Choi, KAIST
- Kevyn Collins-Thompson, Microsoft Research
- Fernando Diaz, Yahoo! Research
- Jacob Eisenstein, Carnegie Mellon University
- Susan Gauch, University of Arkansas
- Matthew Hurst, Microsoft
- Ralf Herbrich, Microsoft
- Chao Liu, Microsoft Research
- Yoelle Maarek, Yahoo! Research
- Donald Metzler, University of Southern California
- Jennifer Neville, Purdue University
- Bo Pang, Yahoo! Research
- Filip Radlinski, Microsoft
- Patrick Schmitz, Ludicrum Enterprises
- Xuehua Shen, BlueKai
- Pu Wang, George Mason University
- Yisong Yue, Carnegie Mellon University