SIGIR'07 - Tutorials

SIGIR'07 HOME

For Participants

Programme

For Contributors

About

The 30^th Annual International ACM SIGIR Conference
23-27 July 2007, Amsterdam

ON THIS PAGE

2A User Centered Evaluation
2B Web Retrieval
2C Text Mining
2D BM25 and Beyond
2E Cross-language Access
2F Web Advertising
2G Learning for IR
2H XML Retrieval
2K Recommender Systems

Tutorials

Tutorials take place at the University of Amsterdam, Building A, on Monday July 23rd.

2A - Conducting User-Centered IR System Evaluations

Diane Kelly (Univ. of North Carolina)

As more and more information systems are being developed for end-users, it is critical that researchers begin to evaluate information retrieval (IR) systems from the perspective of users. The involvement of users in the IR evaluation framework necessarily makes the evaluation different, and slightly more complex, than traditional system-centered IR evaluation. One important difference is that a number of interactions between the user and the system, task, and information objects must now be accounted for in some way.

The purpose of this tutorial is to familiarize participants with major elements of user-centered interactive IR evaluation and provide them with a foundation for conducting, reporting and evaluating such studies. This tutorial is appropriate for students and researchers at all levels who have little or no formal training in, or experience with, user-centered evaluations.

This tutorial will cover different approaches to conducting user-centered evaluations, elements of experimental design, sampling, the identification and articulation of variables and measures, the collection and analysis of data, and the reporting of results. The collection and analysis of quantitative data will be emphasized. The tutorial will conclude with an overview of emerging trends in interactive IR evaluation.

Dr Diane Kelly is an Assistant Professor at the School of Information and Library Science at the University of North Carolina in Chapel Hill. Kelly holds an undergraduate degree in psychology (University of Alabama), a graduate certificate in cognitive science, and a master's and Ph.D. in Information Science (all from Rutgers University). Kelly has extensive formal training in experimental design and quantitative and qualitative research methods, and has taught research methods courses at UNC and Rutgers. Kelly's practical experiences conducting user studies and experiments in the context of interactive information retrieval systems extend for almost ten years. She has been involved with numerous laboratory studies of experimental IR systems, formative evaluations of interactive QA systems, and longitudinal, naturalistic studies of search behavior and implicit feedback. Kelly has also participated in TREC for a number of years: as part of the Interactive Track for six years, the HARD Track for three years and as the interactive QA task co-coordinator.

2B/2F - Introduction to Web Retrieval and Advertising

Ricardo Baeza-Yates, Andrei Broder, Prabhakar Raghavan (Yahoo! Research)

Part I: Introduction to Web Retrieval

This tutorial provides an introduction to the main concepts, issues, and techniques of web-based information retrieval. Topics covered include the differences between conventional and web IR, the evolution of web search technology, and an overview of the main technologies underlying current web search engines: crawling, corpus construction and indexing, ranking (including link analysis), query processing, and web spam defense.

Part II: Introduction to Web advertising

ability of advertisers to reach their intended audience. Using information such as users' queries, the content they are currently viewing, past behavior, and registration demographics, web advertisers potentially have a much finer-grained control of their audience than what is generally achievable within traditional media such as broadcast and print. This environments triggers new and challenging retrieval problems, such as matching ads with queries ("sponsored search") and matching ads to the page being browsed ("content match") as well as new ranking problems where the objective function depends not only on the quality of the match, but also on the underlying economic model.

Ricardo Baeza-Yates is director of Yahoo! Research Barcelona and Latin-America in Santiago, Chile. Until 2005 he was ICREA Professor at UPF in Barcelona and also director of the Center for Web Research, University of Chile. In 2003 he was the first computer scientist to be elected to the Chilean Academy of Sciences. His research includes IR, algorithms, and information visualization. He is co-author of the book Modern Information Retrieval by Addison-Wesley (1999). He holds a PhD from the University of Waterloo, Canada.

Andrei Z. Broder is a Yahoo! Research Fellow and VP for Emerging Search Technology. Previously he has been a Distinguished Engineer in IBM and VP for Research and Chief Scientist at AltaVista. His main research interests are the design, analysis, and implementation of randomized algorithms and supporting data structures, in particular in the context of web-scale information retrieval and applications (best paper awards at WWW6 & WWW9). He is a fellow of IEEE and holds a Ph.D. from Stanford University.

Prabhakar Raghavan is Head of Yahoo! Research, and Consulting Professor of Computer Science at Stanford University. His research interests include semi-structured retrieval, text mining and randomized algorithms (he is coauthor of the book Randomized Algorithms with R. Motwani by Cambridge Press). He is Editor-in-chief of the Journal of the ACM and a Fellow of the ACM and of the IEEE. He is also a member of the Computer Science and Telecommunications Board and of the National Academy of Sciences. He holds a PhD from the University of California at Berkeley.

2C - Introduction to Text Mining

David Lewis (David D. Lewis Consulting)

Text mining is both an exciting new application area, and a buzzword used to rebrand well-known information retrieval and natural language processing technologies. This tutorial will review both old and new technologies for learning about real world entities (people, organizations, events, etc.) from linguistic data alone or with other data. Both case studies on proprietary data and examples using public data sets and open source software will be presented.

Dave Lewis is an entrepreneur and consulting computer scientist based in Chicago, IL, USA. He has worked on a wide range of problems in information retrieval, machine learning, and natural language processing, and designed a variety of evaluations and data sets in these areas. He previously held research positions at AT&T Labs, Bell Labs, and the University of Chicago. He was recently elected a Fellow of the American Association for the Advancement of Science.

2D - The Probabilistic Relevance Model: BM25 and beyond

Hugo Zaragoza (Yahoo! Research Barcelona), Steve Robertson (Microsoft Research Cambridge)

The Probabilistic Relevance Model (PRM) is the formal framework behind BM25 and some of the most widely used algorithms for retrieval. In this tutorial we will discuss the theoretical modeling and the practical tuning work that is required to understand the PRM, derive new algorithms and go beyond BM25.

Hugo Zaragoza is a researcher in Information Retrieval at Yahoo! Research Barcelona. He is interested in the applications of machine learning and natural language processing to information retrieval. Previously he worked at Microsoft Research Cambridge (UK) and collaborated with Microsoft product groups such as MSN-Search and SharePoint Portal Server.

Stephen Robertson runs the Information Retrieval and Analysis group at Microsoft Research Cambridge (UK). He one of the inventors of the Probabilistic Relevance Model and of Okapi BM25. Prior to joining Microsoft, he was at City University London, where he retains a part-time position. He was awarded the Tony Kent STRIX award by the Institute of Information Scientists in 1998 and the Salton Award by ACM SIGIR in 2000.

A more detailed description of this tutorial is available here.

2E - Cross-Language Information Access

Jianqiang Wang (SUNY Buffalo), Daqing He (Univ. of Pittsburgh)

Cross-language information retrieval (CLIR) is a problem generally defined as finding information in one language in response to queries written in another language. It has been claimed to be a "solved" problem because the best automatic CLIR techniques can rank documents just as effectively as their monolingual counterparts -- if the required high quality translation resources are available. However, CLIR systems or systems with a CLIR component have not been widely used in practice, something sharply in comparison to monolingual information retrieval. Therefore, the questions to be asked are: 1) what are the technologies developed to solve CLIR problems, and 2) has CLIR really been solved? In this tutorial, with the intention to provide answers to the two questions mentioned above, we will not only introduce the state-of-the-art techniques for CLIR, but also extend the discussion to the broader context of cross-language information access (CLIA). The discussion will examine the effectiveness of cross-language retrieval systems, and the issues related to extending technologies to various information access tasks and supporting users in their decision making. The tutorial will provide an invaluable opportunity for those who are relatively new to CLIA to gain a solid and comprehensive understanding of CLIA as well as broaden the knowledge of more experienced participants with new applications of CLIA techniques to such areas as speech retrieval, question-answering, and interaction design.

Dr. Jianqiang Wang is an assistant professor in the Department of Library and Information Studies, the State University of New York at Buffalo. He holds a Ph.D. in Library and Information Services from the University of Maryland at College Park. His research interests have been in cross-language information retrieval, speech retrieval, and information seeking in multilingual and multimedia environments. He also played a major role in building test collections for four Interactive CLEF (iCLEF) tracks, two CLEF Cross-Language Speech Retrieval (CL-SR) tracks, and the first TREC Legal Discovery track.

Dr. Daqing He is an assistant professor in the School of Information Sciences, University of Pittsburgh. He obtained his PhD degree from the University of Edinburgh, Scotland. His main research interests are in the areas of information retrieval, interactive retrieval system design, and multilingual and adaptive web search systems. Dr. He is an active participant in the DARPA GALE project and its predecessor, the TIDES project, where he has been concentrated on cross-language retrieval system design and evaluation.

Unfortunately, Jianqiang Wang cannot make it to the conference. The full tutorial will be given by Daqing He.

2G - Supervised and Semi-supervised Learning for IR

Yi Zhang (UC Santa Cruz), Rong Jin (Michigan State Univ)

This tutorial will present a broad coverage of supervised and semi-supervised learning techniques and their application to information retrieval, with focus on semi-supervised learning. It will be organized into three parts: 1) a brief introduction to supervised learning, and its application to text categorization and ranking; 2) overview of semi-supervised classification and the related learning algorithms, illustrated by the applications to information retrieval; and 3) introduction to active learning and its related learning algorithms, with the emphasis on its application to interactive retrieval and adaptive filtering.

Dr. Yi Zhang is an Assistant Professor at Baskin School of Engineering, University of California Santa Cruz. Her research is related to information retrieval, text mining, statistical machine learning, and natural language processing. She has published research papers on information filtering, recommender systems, language models, search personalization, active learning, and information extraction. She has served as PC member or reviewer for conferences and journals in the area of information retrieval and machine learning. She has collaborated with start-ups, large corporations and government agencies on related topics. She has received the Best Paper Award in ACM SIGIR 2002. Dr. Zhang received her Ph.D. and M.S. from Carnegie Mellon University and B.S. from Tsinghua University.

Dr. Rong Jin is an assistant Prof. of the Computer and Science Engineering Dept. of Michigan State University since 2003. He is working in the areas of statistical machine learning and its application to information retrieval. In the past, Dr. Jin has worked on a variety of machine learning algorithms, and has presented efficient and robust algorithms for conditional exponential models, support vector machine, and boosting. In addition, he has extensive experience with the application of machine learning algorithms to information retrieval, including retrieval models, collaborative filtering, cross lingual information retrieval, document clustering, and video/image retrieval. In the past, he has published over sixty conference and journal articles on the related topics. Dr. Jin holds a B.A. in Engineering from Tianjin University, an M.S. in Physics from Beijing University, and an M.S. and Ph.D. in Computer Science from Carnegie Mellon University. He received the NSF career award in 2006.

Notice that Rong Jin cannot make it to the conference himself, so the full tutorial will be given by Yi Zhang.

2H - XML Retrieval: Integrated IR-DB Challenges and Solutions

Sihem Amer-Yahia (Yahoo! Research), Ricardo Baeza-Yates (Yahoo! Research), Mariano Consens (Univ. of Toronto), Mounia Lalmas (Query Mary Univ. of London)

The two distinct cultures of databases and information retrieval now have natural meeting places in the Web, Digital Libraries and Enterprise Environments with their semi-structured XML model. This tutorial will provide an overview of the different issues (basic concepts, requirements, models) and approaches (techniques, evaluations) put forward by the IR and DB communities. It will in particular survey the DB-IR integration efforts as they focus in the problem of retrieval from XML content.

Sihem Amer-Yahia joined Yahoo! Research in May 2006. Until then, she was a member of Technical Staff at AT&T Labs. She is a co-editor of the XQuery Full-Text Language Specification and Use Cases published by the W3C Full-Text Task Force. Her current research focuses on issues related to processing top-k queries in online shopping and community-aware ranking in online communities.

Ricardo Baeza-Yates is director of Yahoo! Research Barcelona and Latin-America in Santiago, Chile. Until 2005 he was ICREA Professor at UPF in Barcelona and also director of the Center for Web Research, University of Chile. He is co-author of the book Modern Information Retrieval by Addison-Wesley (1999). He received his PhD from the University of Waterloo in 1989. His research includes IR, algorithms, and information visualization.

Mariano P. Consens research interests are in the areas of Data Management Systems and the Web. He received his PhD from the University of Toronto and is presently a faculty member at University of Toronto. He has also been active in the software industry as a founder and CTO of several start-ups.

Mounia Lalmas received a PhD in Computer Science from the University of Glasgow in 1996. Presently she is a Professor of Information Retrieval at Queen Mary, University of London, which she joined as a lecturer in 1999. She is the co-leader of the INEX initiative, with over 60 participating organizations worldwide.

A more detailed description of this tutorial is available here.

2K - Introduction to Recommender Systems

Joseph A. Konstan (University of Minnesota)

E-commerce sites such as Amazon.com and streaming music services have made recommender systems--systems that select items to present to a user from a variety of choices--nearly ubiquitous. This tutorial provides an introduction to recommender systems: the algorithms behind them, design factors for implementing them, case studies of applications, and lessons from research.

Topics covered include: an overview of the recommender system design and application space; case studies of recommender examples from e-commerce, content, online community, and research; an in-depth exploration of an individual recommender system; an overview of recommender system algorithms, with details on several different algorithms; and a set of recommender system design principles.

This tutorial is designed for practitioners and researchers who design commerce, content, and community systems that could benefit from personalization. No prior experience with recommender systems is necessary. When you leave this tutorial, you should understand the range of technologies being used for recommender systems and how those technologies are embedded in applications.

Joseph A. Konstan is Professor of Computer Science and Engineering at the University of Minnesota and co-Director of the GroupLens Research Group. His dozen years of research on recommender systems ranges from user interface design, to algorithm development and evaluation, to studies of user behavior. He co-founded Net Perceptions and is co-author of the book Word of Mouse: The Marketing Power of Collaborative Filtering.