Distributed Intelligence

Reinforcement Learning Lecture - WS 2024/25

This lecture is part of the Machine Learning Masters program at the University of Tübingen. The course is run by the Autonomous Learning Group.

Dates:

Course description:

The course will provide you with theoretical and practical knowledge of reinforcement learning, a field of machine learning concerned with decision-making and interaction with dynamical systems, such as robots. We start with a brief overview of supervised learning and spend the most time on reinforcement learning. The exercises will help you get hands-on with methods and deepen your understanding.

Qualification Goals:

Students gain an understanding of reinforcement learning formulations, problems, and algorithms on a theoretical and practical level. After this course, students should be able to implement and apply deep reinforcement learning algorithms to new problems. 

Course materials:

Both slides and exercises are available on ILIAS.

Lectures

  1. Lecture 1 Introduction to the course, Reinforcement Learning (RL) History and RL setup; Background reading: Sutton and Barto Reinforcement learning for the next few lectures (for this lecture, parts of Chapter 3)
  2. Lecture 2 MDPs; Background reading: Sutton and Barto Reinforcement learning Chapter 4
  3. Lecture 3 Model-free Prediction; Background reading: Sutton and Barto Reinforcement learning First part of Chapters 5, 6, 7, 12
  4. Lecture 4 Model-free Control; Background reading: Sutton and Barto Reinforcement learning Chapters 5.2, 5.3, 5.5, 6.4, 6.5, 12.7
  5. Lecture 5  Bandits and Exploration; Background reading: Bandit Algorithms, Lattimore and Szepesvari 2020, Chapters 4,6,7.
    https://tor-lattimore.com/downloads/book/book.pdf
  6. Lecture 6: Value Function Approximation; Background reading: Sutton and Barto Reinforcement learning Chapters 9.1-9.8, 10.1, 10.2, 11.1-11.3. Supplementary: DQN paper 1, paper 2, NFQ paper
  7. Lecture 7: Policy Gradient; Background reading: Sutton and Barto Reinforcement learning Chapters 13
  8. Lecture 8: Policy Gradient and Actor-Critic; Background reading: Natural Actor Critic Paper, TRPO Paper, PPO Paper
  9. Lecture 9: Q-learning style Actor-Critic; Background reading: DPG Paper, DDPG Paper, TD3 Paper, SAC paper
  10. Lecture 10: Exploration and Tricks to improve Deep RL (with recent work from my group); Background reading:  ICM Paper, RND Paper, Pink-Noise PaperHER paper, CrossQ paper, Stop Regressing paper
  11. Lecture 11: Model-based Methods: Dyna-Q, MBPO; Background reading: Sutton and Barto Reinforcement learning Chapters 8 and the MBPO Paper
  12. Lecture 12: Model-based Methods II: Online-Planning: (with recent work from my group); CEM, PETS, iCEM; CEE-US; Background reading: PETS paper, iCEM paper (video)CEE-US paper (videos)
  13. Lecture 13: Alpha Go and Alpha Zero, Dreamer. Background reading: AlphaGo paper (also in Ilias, because behind the paywall), AlphaZero Paper, and Dreamer Paper
  14. Lecture 14: Offline RL. Background reading: CQL paper, CRR paper, Benchmarking paper

Further Readings

  • Sutton & Barto, Reinforcement Learning: An Introduction
  • Bertsekas, Dynamic Programming and Optimal Control, Vol. 1
  • Bishop, Pattern Recognition and Machine Learning