Meta-reinforcement learning (Meta-RL) enables agents to rapidly adapt to new tasks by leveraging prior experience. However, existing approaches struggle in high-dimensional environments because they treat all information uniformly, failing to identify which dimensions causally influence rewards.
We propose C2A (Causality-aware Context Adaptation), a novel Meta-RL framework that discovers and exploits temporal causal structures in trajectories. Our approach consists of three key components:
Experiments on MuJoCo locomotion tasks and Meta-Highway autonomous driving demonstrate that C2A achieves 2× faster convergence while maintaining competitive final performance compared to state-of-the-art methods.
In high-dimensional environments, only a small subset of dimensions causally influence rewards. For example, in autonomous driving with 45 observable vehicles, typically only 2-3 actually affect lane-changing decisions. Traditional Meta-RL approaches fail to identify these causal relationships, leading to inefficient exploration and slower adaptation.
Figure 1: C2A identifies causal dimensions (green) vs black-box approaches (red) that process all dimensions uniformly
VarLiNGAM discovers causal structure through a temporal model:
$$x_t = \sum_{\tau=0}^{p} B_\tau x_{t-\tau} + e_t$$
where \(B_0\) captures instantaneous effects and \(B_\tau\) captures time-lagged dependencies.
CausalVAE learns disentangled representations guided by the discovered causal structure:
$$\mathcal{L}_{CausalVAE} = \mathcal{L}_{recon} + \beta \cdot D_{KL}(q(z|x) || p(z)) + \lambda \cdot \mathcal{L}_{causal}$$
The causal importance \(c = |b^{(r)}| / ||b^{(r)}||_1\) guides exploration through ADCR:
$$H_{ADCR} = \sum_{i=1}^{d_a} w_i(z,c) \cdot H(a_i|s,z)$$
This enables the agent to focus exploration on causally important dimensions, achieving faster convergence.
Algorithm: C2A Meta-Training
| Method | Ant-Dir | Cheetah-Dir | Walker-Params |
|---|---|---|---|
| PEARL | 1119.9±152.3 | 1737.0±211.6 | 742.6±70.0 |
| CoRRO | 1142.6±158.6 | 1646.0±210.5 | 784.7±87.0 |
| C2A (Ours) | 1243.3±221.5 | 1727.6±416.9 | 841.0±120.5 |
Figure 2: MuJoCo test performance across different locomotion tasks
Figure 3: C2A achieves 2× faster convergence on Meta-Highway ML10
Figure 4: Discovered causal structures reveal task-specific patterns
Paper submitted to IEEE Robotics and Automation Letters (RA-L), 2025
@article{lee2025c2a,
author = {Jisu Lee and Myoung Hoon Lee and Jun Moon},
title = {C2A: Causality-aware Context Adaptation for Meta-Reinforcement Learning},
journal = {IEEE Robotics and Automation Letters (RA-L)},
note = {submitted},
year = {2025},
}