C2A: Causality-aware Context Adaptation for Meta-Reinforcement Learning

1Hanyang University, 2Incheon National University
Corresponding authors
C2A Framework

C2A Framework: Bidirectional integration of temporal causal discovery and latent representation learning for efficient meta-RL

Abstract

Meta-reinforcement learning (Meta-RL) enables agents to rapidly adapt to new tasks by leveraging prior experience. However, existing approaches struggle in high-dimensional environments because they treat all information uniformly, failing to identify which dimensions causally influence rewards.

We propose C2A (Causality-aware Context Adaptation), a novel Meta-RL framework that discovers and exploits temporal causal structures in trajectories. Our approach consists of three key components:

  • VarLiNGAM: Discovers temporal causal structures from trajectories, identifying which dimensions truly influence rewards across time lags
  • CausalVAE: Learns disentangled representations with causal structure, providing interpretable latent factors for control
  • ADCR (Adaptive Dimension-wise Causal Regularization): Adaptively focuses exploration on causally important dimensions

Experiments on MuJoCo locomotion tasks and Meta-Highway autonomous driving demonstrate that C2A achieves 2× faster convergence while maintaining competitive final performance compared to state-of-the-art methods.

The Challenge

In high-dimensional environments, only a small subset of dimensions causally influence rewards. For example, in autonomous driving with 45 observable vehicles, typically only 2-3 actually affect lane-changing decisions. Traditional Meta-RL approaches fail to identify these causal relationships, leading to inefficient exploration and slower adaptation.

Motivation

Figure 1: C2A identifies causal dimensions (green) vs black-box approaches (red) that process all dimensions uniformly

Method

1. Temporal Causal Discovery with VarLiNGAM

VarLiNGAM discovers causal structure through a temporal model:

$$x_t = \sum_{\tau=0}^{p} B_\tau x_{t-\tau} + e_t$$

where \(B_0\) captures instantaneous effects and \(B_\tau\) captures time-lagged dependencies.

2. Causal Representation Learning with CausalVAE

CausalVAE learns disentangled representations guided by the discovered causal structure:

$$\mathcal{L}_{CausalVAE} = \mathcal{L}_{recon} + \beta \cdot D_{KL}(q(z|x) || p(z)) + \lambda \cdot \mathcal{L}_{causal}$$

3. Adaptive Exploration with ADCR

The causal importance \(c = |b^{(r)}| / ||b^{(r)}||_1\) guides exploration through ADCR:

$$H_{ADCR} = \sum_{i=1}^{d_a} w_i(z,c) \cdot H(a_i|s,z)$$

This enables the agent to focus exploration on causally important dimensions, achieving faster convergence.

Algorithm

Algorithm: C2A Meta-Training

  1. Collect context trajectories from task τ
  2. Discover causal structure via VarLiNGAM → {B_τ}
  3. Encode with CausalVAE guided by structure → z
  4. Execute policy π(a|s,z) with ADCR exploration
  5. Update with bidirectional consistency loss

Experimental Results

MuJoCo Benchmarks

Method Ant-Dir Cheetah-Dir Walker-Params
PEARL 1119.9±152.3 1737.0±211.6 742.6±70.0
CoRRO 1142.6±158.6 1646.0±210.5 784.7±87.0
C2A (Ours) 1243.3±221.5 1727.6±416.9 841.0±120.5
MuJoCo Results

Figure 2: MuJoCo test performance across different locomotion tasks

Meta-Highway (Autonomous Driving)

Sample Efficiency

Figure 3: C2A achieves 2× faster convergence on Meta-Highway ML10

Discovered Causal Structures

Discovered Causal Structures

Figure 4: Discovered causal structures reveal task-specific patterns

BibTeX

Paper submitted to IEEE Robotics and Automation Letters (RA-L), 2025

@article{lee2025c2a,
  author = {Jisu Lee and Myoung Hoon Lee and Jun Moon},
  title  = {C2A: Causality-aware Context Adaptation for Meta-Reinforcement Learning},
  journal = {IEEE Robotics and Automation Letters (RA-L)},
  note   = {submitted},
  year   = {2025},
}