This introductory course will provide the main methodological building blocks of reinforcement learning. Reinforcement Learning (RL) refers to situations where the learning algorithm operates in close-loop, simultaneously using past data to adjust its decisions and taking actions that will influence future observations. Algorithms based on RL concepts are now commonly used in programmatic marketing on the web, robotics or in computer game playing. All models for RL share a common concern that in order to attain one’s long-term optimality goals, it is necessary to reach a proper balance between exploration (discovery of yet uncertain behaviors) and exploitation (focusing on the actions that have produced the most relevant results so far). This introductory course will provide the main methodological building blocks of reinforcement learning. Some basic notions in probability theory are required to follow the course. The course will imply some work on simple implementations of the algorithms, assuming familiarity with common scientific computing language. Program 1. Multiarmed bandits, Markov Decision Processes and other models 2. Planning: finite and infinite horizon problems, the value function, Bellman equations, dynamic programming, value and policy iteration 3. Probabilistic and statistical tools for RL: Bayesian models, relative entropy and hypothesis testing, concentration inequalities, linear regression, the stochastic approximation algorithm 4. RL algorithms for multiarmed bandits: the explore vs. exploit compromise, bandit algorithms vs. A/B testing, UCB, Thomson sampling, contextual bandits 5. RL algorithms for Markov Decision Processes: off policy and on policy learning, Q-learning, SARSA, Monte Carlo tree search

Bibliographie, lectures recommandées

M. Puterman. Markov Decision Processes: Discrete Stochastic Dynamic
Programming. John Wiley & Sons, 1994. R. Sutton and A. Barto. Introduction to Reinforcement Learning. MIT Press,
1998. C. Szepesvari. Algorithms for Reinforcement Learning. Morgan & Claypool
Publishers, 2010 J. Myles White. Bandit Algorithms for Website Optimization. O’Reilly. 2012 T. Lattimore and C. Szepesvari.
Bandit Algorithms. Cambridge University Press. 2019.