Methodenzentrum

3rd Fall School of the Methods Center on October 16, 2024

Interdisciplinary Methods

The Methods Center at the University of Tübingen cordially invites PhD candidates, postdoctoral researchers, and professors to join our Fall School. 
The topic of this year's workshop is "Psychometrics for Large Language Model Evaluation: Lessons and Challenges" which will be given by Tom Sühr. Tom Sühr is a doctoral candidate at the Max Planck Institute for Intelligent Systems in the Human Aspects of Machine Learning group.

We are looking forward to an exciting and productive gathering!

Workshops

Workshop 1: Psychometrics for Large Language Model Evaluation: Lessons and Challenges (in English) - Tom Sühr

As the capabilities of Large Language Models (LLMs) continue to expand, the need for rigorous evaluation methods becomes increasingly critical. This workshop dives into what the machine learning community can learn from psychometrics—specifically Item Response Theory (IRT) and Classical Test Theory (CTT)—to enhance the benchmarking of LLMs. We will also learn about potential pitfalls and critically investigate the application of existing psychometrics to LLMs.

The workshop will begin with a theoretical and practical introduction to LLMs, including hands-on coding examples that demonstrate how to prompt and finetune these models efficiently, with a focus on reducing memory requirements. Participants will then learn how to administer current benchmarks and evaluate LLM responses. Finally, we will analyze existing benchmarks with psychometric tools.

By the end of this workshop, attendees will have a foundational understanding of how LLMs work and how to effectively administer benchmarks for their evaluation. They will also learn how psychometric tools can offer insights into LLM performance, as well as an awareness of the challenges involved in applying these methods.

This session is ideal for machine learning researchers and practitioners looking to adopt or refine psychometric techniques in their work with LLMs. And psychometric or econometric researchers who are interested in an introduction to LLMs. Examples and distributed code will be in Python and R.

Please bring a laptop for the exercises that has Python (version 3.10+) and R installed. We will send an email before the event that includes more information about necessary packages.