Towards Safe Reinforcement Learning

Professor Andreas Krause, ETH Zurich

Reinforcement learning has seen stunning empirical breakthroughs. At its heart is the challenge of trading exploration -- collecting data for learning better models -- and exploitation -- using the estimate to make decisions. In many applications, exploration is a potentially dangerous proposition, as it requires experimenting with actions that have unknown consequences. Hence, most prior work has confined exploration to simulated environments. In this talk, I will present our work towards rigorously reasoning about safety of exploration in reinforcement learning. I will first discuss a model-free approach, where we seek to optimize an unknown reward function subject to unknown constraints. Both reward and constraints are revealed through noisy experiments, and safety requires that no infeasible action is chosen at any point. I will also discuss model-based approaches, where we learn about system dynamics through exploration, yet need to guarantee stability of the estimated policy. Our approaches use Bayesian inference over the objective, constraints and dynamics, and -- under some regularity conditions -- are guaranteed to be both safe and complete, i.e., converge to a natural notion of reachable optimum. I will also show experiments on safe automatic parameter tuning of robotic platforms, as well as safe exploration of unknown environments.