Reinforcement Learning Lecture - WS 2024/25

This lecture is part of the Machine Learning Masters program at the University of Tübingen. The course is run by the Autonomous Learning Group.

Dates:

Tue 14:15 - 15:45: lecture in N10 (Bio Hörsaalgebäude AdM 3) on Morgenstelle 3 (starting on October 15th)
Tue. 16:00 - 18:00: tutorial A in room N16 / M3 (C-Bau [Mathe/Physik])
Tue. 16:00 - 18:00: tutorial B in room N3 (22., 29.10. and 5.11) afterwards D7H41 (D-Bau [Physik])
Th. 16:00 - 18:00: Tutorial C in Lecture Hall TTR2 (Cyber Valley - TTR2) (starting on October 22th)

Course description:

The course will provide you with theoretical and practical knowledge of reinforcement learning, a field of machine learning concerned with decision-making and interaction with dynamical systems, such as robots. We start with a brief overview of supervised learning and spend the most time on reinforcement learning. The exercises will help you get hands-on with methods and deepen your understanding.

Qualification Goals:

Students gain an understanding of reinforcement learning formulations, problems, and algorithms on a theoretical and practical level. After this course, students should be able to implement and apply deep reinforcement learning algorithms to new problems.

Course materials:

Both slides and exercises are available on ILIAS.

Lectures

Lecture 1 Introduction to the course, Reinforcement Learning (RL) History and RL setup; Background reading: Sutton and Barto Reinforcement learning for the next few lectures (for this lecture, parts of Chapter 3)
Lecture 2 MDPs; Background reading: Sutton and Barto Reinforcement learning Chapter 4
Lecture 3 Model-free Prediction; Background reading: Sutton and Barto Reinforcement learning First part of Chapters 5, 6, 7, 12
Lecture 4 Model-free Control; Background reading: Sutton and Barto Reinforcement learning Chapters 5.2, 5.3, 5.5, 6.4, 6.5, 12.7
Lecture 5 Bandits and Exploration; Background reading: Bandit Algorithms, Lattimore and Szepesvari 2020, Chapters 4,6,7.
https://tor-lattimore.com/downloads/book/book.pdf
Lecture 6: Value Function Approximation; Background reading: Sutton and Barto Reinforcement learning Chapters 9.1-9.8, 10.1, 10.2, 11.1-11.3. Supplementary: DQN paper 1, paper 2, NFQ paper
Lecture 7: Policy Gradient; Background reading: Sutton and Barto Reinforcement learning Chapters 13
Lecture 8: Policy Gradient and Actor-Critic; Background reading: Natural Actor Critic Paper, TRPO Paper, PPO Paper
Lecture 9: Q-learning style Actor-Critic; Background reading: DPG Paper, DDPG Paper, TD3 Paper, SAC paper
Lecture 10: Exploration and Tricks to improve Deep RL (with recent work from my group); Background reading: ICM Paper, RND Paper, Pink-Noise Paper, HER paper, CrossQ paper, Stop Regressing paper
Lecture 11: Model-based Methods: Dyna-Q, MBPO; Background reading: Sutton and Barto Reinforcement learning Chapters 8 and the MBPO Paper
Lecture 12: Model-based Methods II: Online-Planning: (with recent work from my group); CEM, PETS, iCEM; CEE-US; Background reading: PETS paper, iCEM paper (video), CEE-US paper (videos)
Lecture 13: Alpha Go and Alpha Zero, Dreamer. Background reading: AlphaGo paper (also in Ilias, because behind the paywall), AlphaZero Paper, and Dreamer Paper
Lecture 14: Offline RL. Background reading: CQL paper, CRR paper, Benchmarking paper