SIGIR-AP 2025

Fernando Diaz Carnegie Mellon University, USA

Title: Preference-Based Evaluation
Abstract:

Recent advances in AI have heightened attention on the foundations of evaluation. As models become more performant, traditionalmetrics and benchmarks increasingly fail to capture meaningfuldifferences in system behavior. Indeed, Voorhees et al. observethat modern retrieval models have saturated high-precision metrics, calling for “new strategies and tools for building reliable testcollections.”I describe preference-based evaluation, a framework that reinterprets evaluation as an ordering over system behaviors ratherthan the computation of numeric scores. Although common inlaboratory studies and online evaluation, automatic evaluation methods—such as average precision or reciprocal rank—havetraditionally lacked preference-based counterparts. Drawing onfoundational work in information retrieval evaluation and social-choice theory, I introduce a family of methods for conducting efficient, automatic, preference-based evaluation. Through a series of experiments across retrieval and recommendation tasks, preference-based versions of precision, recall, and average precision all demonstrate substantially higher sensitivity, addressing recent trends of metric saturation.

Fernando Diaz is an Associate Professor at Carnegie Mellon University’s Language Technologies Institute and a Research Scientist at Google. His current research covers three themes: quantitative evaluation of AI systems, retrieval-enhanced AI, and understanding thecultural impact of AI in domains like music and literature through interdisciplinary collaboration. With extensive industry experience, including leadership roles at Microsoft Research Montréal and Spotify, Fernando studies the practical deployment of AI. His expertisein search engines and recommender systems is recognized through awards at SIGIR, CIKM, CSCW, and others. He received the 2017 BCS Karen Spärck Jones Award and holds a CIFAR AI Chair. Fernando actively contributes to the field, co-organizing events suchas NIST TREC tracks, WSDM, FAccT, and SIGIR, and the CIFAR Workshop on AI and Cultural Curation.

Min Zhang Tsinghua University, China

Title: LLMs vs. Humans in Information Access Tasks: Performances, Behaviors, and Learning Abilities
Abstract:

With the deepening of research into LLMs, it is the right time to understand the similarities and distinctions between LLMs and human users. This talk addresses several questions from a user-centric viewpoint in information access tasks: How can we evaluate the performance of large models, and what is their efficacy? To what extent do LLMs' conversational behaviors differ from those of humans in IR tasks? How does their capacity for test-time learning from conversational reasoning experiences stack up against humans? Some of our recent explorations and findings on these questions will also be presented. Hope discussions on the related topics will offer some new perspectives and inspire future research into the behavior and reasoning mechanisms of LLMs in information access tasks.

Min Zhang is a full professor in the Department of Computer Science & Technology, Tsinghua University. and the director of the AI Lab. She specializes in Web search, recommendation, and user modeling. Prof. Zhang is an ACM distinguished member and a SIGIR Academy fellow. She has been the Editor-in-Chief of ACM Transactions on Information Systems (TOIS) since 2020, and also serves as the General co-Chair of ACM MM'25, and PC co-Chairs of SIGIR'26, RecSys'23, CIKM'23, WSDM'17, etc. She won the “Test-of-Time” Award at SIGIR'24, EMNLP'24 Best Resource, WSDM'22 Best Paper, IBM Global Faculty Award, etc, and has a lot of collaborations with international and domestic industries.

Keynotes

Keynote 1

Keynote 2