This page provides information and materials about "Causal Reinforcement Learning
" (CRL), following the tutorial presented at
ICML 2020.
For the corresponding material, check out the links:
We list below some additional CRL resources, and expect to add more in the future, stay tuned!
Elias Bareinboim is an associate professor in the Department of Computer Science and the director of the Causal Artificial Intelligence (CausalAI) Laboratory at Columbia University. His research focuses on causal and counterfactual inference and their applications to artificial intelligence, machine learning, and the empirical sciences. His work was the first to propose a general solution to the problem of "causal data-fusion," providing practical methods for combining datasets generated under different experimental conditions and plagued with various biases. In recent years, Bareinboim has been developing a framework called causal reinforcement learning (CRL), which combines structural invariances of causal inference with the sample efficiency of reinforcement learning. Before joining Columbia, he was an assistant professor at Purdue University and received his Ph.D. in Computer Science from the University of California, Los Angeles. Bareinboim was named one of "AI's 10 to Watch" by IEEE, and is a recipient of an NSF CAREER Award, the Dan David Prize Scholarship, the 2014 AAAI Outstanding Paper Award, and the 2019 UAI Best Paper Award.
Causal inference provides a set of tools and principles that allows one to combine data and structural invariances about the environment to reason about questions of counterfactual nature — i.e., what would have happened had reality been different, even when no data about this imagined reality is available. Reinforcement Learning is concerned with efficiently finding a policy that optimizes a specific function (e.g., reward, regret) in interactive and uncertain environments. These two disciplines have evolved independently and with virtually no interaction between them. In reality, however, they operate over different aspects of the same building block, i.e., counterfactual relations, which makes them umbilically tied.
In this tutorial, we introduce a unified treatment based on this observation, putting these two disciplines under the same conceptual and theoretical umbrella. We show that a number of natural and pervasive classes of learning problems emerge when this connection is fully established, which cannot be seen individually from either discipline alone. In particular, we'll discuss generalized policy learning (a combination of online, off-policy, and do-calculus learning), when and where to intervene, counterfactual decision-making (and free-will, autonomy, human-AI collaboration), policy generalizability, and causal imitation learning, among others. This new understanding leads to a broader view of what counterfactual learning is, and suggests the great potential for the study of causality and reinforcement learning side by side. We call this new line of investigation "Causal Reinforcement Learning" (CRL, for short).
Part | Subject | Material |
1 | Foundations of Causal Inference (CI) and Reinforcement Learning (RL) |
|
This module introduces some basic results in causal inference (SCMs and Graphs) and the 3-layer inferential hierarchy proposed by Pearl and collaborators. We also discuss the Causal Hierarchy Theorem (CHT), which delineates the limits of what can be inferred from a certain data collection. These results give a more general view of the scope of current methods in CI and RL. | ||
2 | Causal Reinforcement Learning (Tasks) |
|
In this module, we discuss some of the fundamental connections between CI and RL, and explain how they can be made more robust and general through CRL lens. We discuss the problems of generalized off-policy learning, when and where interventions should be performed, counterfactual randomization and decision-making, among other tasks. |
This tutorial is targeted at researchers working on the foundations of decision-making, learning, and intelligence as well as in applied areas, including robotics and healthcare. After the tutorial, attendees will be familiar with the basic concepts, principles, and algorithms to solve modern problems involving causal reinforcement learning. In particular, they will develop a basic understanding of the differences between more traditional, a-causal RL tools and the new generation of causal reinforcement learning machinery. The prerequites of this tutorial are an undergraduate level understanding of reinforcement learning and graphical models.
The goals of the tutorial are (1) to introduce the modern theory of causal inference, (2) to connect reinforcement learning and causal inference (CI), introducing causal reinforcement learning, and (3) show a collection of pervasive, practical problems that can only be solved once the connection between RL and CI is established. The implications in decision-making and explainability are direct, so we feel that the tutorial should be compelling for a large part of the community.
There are at least six prominent CRL tasks that have been catalogued:
Learn policy ∏ by systematically combining offline (L1) and online (L2) modes of interaction.
Identify subset of L2 to refine the policy space do(∏(X)) based on topological constraints implied by M on G.
Optimization criterion based on counterfactuals and L3-based randomization (instead of L2/do()-counterpart).
Generalize policy based on structural invariances shared across training (SCM M) and deployment environments (M*).
Learn the causal graph G (of M) by systematically combining observations (L1) and experimentation (L2).
Construct L2-policy based on partially observable L1-data coming from an expert with unknown reward function.
We compiled a list of papers and books relevant to the tutorial. You can click on the references to read the papers.
We followed the CRL classification developed in the tutorial, where [c] indicates a classic paper (off- or on-line).
(This is an initial, tentative list, and if you want to see your paper included on it, please, send us the link as well as a short paragraph explaining how it relates to a specific CRL task. )