Causal Reinforcement Learning

This page provides information and materials about "Causal Reinforcement Learning " (CRL), following the tutorial presented at ICML 2020. For the corresponding material, check out the links:

We list below some additional CRL resources, and expect to add more in the future, stay tuned!

About Speaker

Elias Bareinboim is an associate professor in the Department of Computer Science and the director of the Causal Artificial Intelligence (CausalAI) Laboratory at Columbia University. His research focuses on causal and counterfactual inference and their applications to artificial intelligence, machine learning, and the empirical sciences. His work was the first to propose a general solution to the problem of "causal data-fusion," providing practical methods for combining datasets generated under different experimental conditions and plagued with various biases. In recent years, Bareinboim has been developing a framework called causal reinforcement learning (CRL), which combines structural invariances of causal inference with the sample efficiency of reinforcement learning. Before joining Columbia, he was an assistant professor at Purdue University and received his Ph.D. in Computer Science from the University of California, Los Angeles. Bareinboim was named one of "AI's 10 to Watch" by IEEE, and is a recipient of an NSF CAREER Award, the Dan David Prize Scholarship, the 2014 AAAI Outstanding Paper Award, and the 2019 UAI Best Paper Award.

Tutorial Overview

Causal inference provides a set of tools and principles that allows one to combine data and structural invariances about the environment to reason about questions of counterfactual nature — i.e., what would have happened had reality been different, even when no data about this imagined reality is available. Reinforcement Learning is concerned with efficiently finding a policy that optimizes a specific function (e.g., reward, regret) in interactive and uncertain environments. These two disciplines have evolved independently and with virtually no interaction between them. In reality, however, they operate over different aspects of the same building block, i.e., counterfactual relations, which makes them umbilically tied.

In this tutorial, we introduce a unified treatment based on this observation, putting these two disciplines under the same conceptual and theoretical umbrella. We show that a number of natural and pervasive classes of learning problems emerge when this connection is fully established, which cannot be seen individually from either discipline alone. In particular, we'll discuss generalized policy learning (a combination of online, off-policy, and do-calculus learning), when and where to intervene, counterfactual decision-making (and free-will, autonomy, human-AI collaboration), policy generalizability, and causal imitation learning, among others. This new understanding leads to a broader view of what counterfactual learning is, and suggests the great potential for the study of causality and reinforcement learning side by side. We call this new line of investigation "Causal Reinforcement Learning" (CRL, for short).


Part Subject Material
1 Foundations of Causal Inference (CI) and Reinforcement Learning (RL)
  • Relationship between Causality and RL.
  • Introduction to Structural Causal Models and Causal Graphs.
  • Pearl’s Causal Hierarchy and the Causal Hierarchy Theorem
  • RL from a Causal Lens
This module introduces some basic results in causal inference (SCMs and Graphs) and the 3-layer inferential hierarchy proposed by Pearl and collaborators. We also discuss the Causal Hierarchy Theorem (CHT), which delineates the limits of what can be inferred from a certain data collection. These results give a more general view of the scope of current methods in CI and RL.
2 Causal Reinforcement Learning (Tasks)
  • Generalized off-policy learning
  • When and Where to Intervene
  • Counterfactual data-fusion
  • Other tasks
In this module, we discuss some of the fundamental connections between CI and RL, and explain how they can be made more robust and general through CRL lens. We discuss the problems of generalized off-policy learning, when and where interventions should be performed, counterfactual randomization and decision-making, among other tasks.

Target Audience

This tutorial is targeted at researchers working on the foundations of decision-making, learning, and intelligence as well as in applied areas, including robotics and healthcare. After the tutorial, attendees will be familiar with the basic concepts, principles, and algorithms to solve modern problems involving causal reinforcement learning. In particular, they will develop a basic understanding of the differences between more traditional, a-causal RL tools and the new generation of causal reinforcement learning machinery. The prerequites of this tutorial are an undergraduate level understanding of reinforcement learning and graphical models.

Goals and Tasks in CRL

The goals of the tutorial are (1) to introduce the modern theory of causal inference, (2) to connect reinforcement learning and causal inference (CI), introducing causal reinforcement learning, and (3) show a collection of pervasive, practical problems that can only be solved once the connection between RL and CI is established. The implications in decision-making and explainability are direct, so we feel that the tutorial should be compelling for a large part of the community.

There are at least six prominent CRL tasks that have been catalogued:

Generalized Policy Learning
combining online + offline learning

Learn policy ∏ by systematically combining offline (L1) and online (L2) modes of interaction.

When and Where to Intervene?
refining the policy space

Identify subset of L2 to refine the policy space do(∏(X)) based on topological constraints implied by M on G.

Counterfactual Decision-Making
changing optimization function based on intentionality, free will, and autonomy

Optimization criterion based on counterfactuals and L3-based randomization (instead of L2/do()-counterpart).

Generalizability & Robustness of Causal Claims
transportability & structural invariances

Generalize policy based on structural invariances shared across training (SCM M) and deployment environments (M*).

Learning Causal Models
discovering the causal structure with observation and experiments

Learn the causal graph G (of M) by systematically combining observations (L1) and experimentation (L2).

Causal Imitation Learning
policy learning with unobserved rewards

Construct L2-policy based on partially observable L1-data coming from an expert with unknown reward function.


We compiled a list of papers and books relevant to the tutorial. You can click on the references to read the papers.
We followed the CRL classification developed in the tutorial, where [c] indicates a classic paper (off- or on-line).
(This is an initial, tentative list, and if you want to see your paper included on it, please, send us the link as well as a short paragraph explaining how it relates to a specific CRL task. )

Books and Surveys