Towards Safe Reinforcement Learning
Professor Andreas Krause, ETH Zurich
Reinforcement learning has seen stunning empirical breakthroughs. At its heart is the challenge of trading exploration -- collecting data for learning better models -- and exploitation -- using the estimate to make decisions. In many applications, exploration is a potentially dangerous proposition, as it requires experimenting with actions that have unknown consequences. Hence, most prior work has confined exploration to simulated environments. In this talk, I will present our work towards rigorously reasoning about safety of exploration in reinforcement learning. I will first discuss a model-free approach, where we seek to optimize an unknown reward function subject to unknown constraints. Both reward and constraints are revealed through noisy experiments, and safety requires that no infeasible action is chosen at any point. I will also discuss model-based approaches, where we learn about system dynamics through exploration, yet need to guarantee stability of the estimated policy. Our approaches use Bayesian inference over the objective, constraints and dynamics, and -- under some regularity conditions -- are guaranteed to be both safe and complete, i.e., converge to a natural notion of reachable optimum. I will also show experiments on safe automatic parameter tuning of robotic platforms, as well as safe exploration of unknown environments.
Andreas Krause is a Professor of Computer Science at ETH Zurich, where he leads the Learning & Adaptive Systems Group. He also serves as Academic Co-Director of the Swiss Data Science Center. Before that he was an Assistant Professor of Computer Science at Caltech. He received his Ph.D. in Computer Science from Carnegie Mellon University (2008) and his Diplom in Computer Science and Mathematics from the Technical University of Munich, Germany (2004). He is a Microsoft Research Faculty Fellow, received an ERC Starting Investigator grant, the German Pattern Recognition Award, the Okawa Foundation Research Grant recognizing top young researchers in telecommunications as well as the ETH Golden Owl teaching award. His research on machine learning and adaptive systems has received awards at several premier conferences and journals.