(03) Model-based Meta Learning

Posted Updated

By Ji Sue Lee 4 min read

NTM (Neural Turing Machines)
MANN (Memory-Augmented Neural Networks)
SNAIL (Simple Neural Attentive Meta-Learner)

Main Concept of Model-based Meta-Learning

In Model-based Meta Learning, learning is mainly performed using a recurrent neural network such as Long Short-Term Memory (LSTM).

How Model-based Meta-Learning works?

It involves changing internal dynamics, such as the hidden state of a recurrent neural network, with the goal of learning it well.

The core idea of Model-based Meta Learning is to train $D_{\textnormal{meta-train}}$ data as we would typically do in conventional supervised learning, using recurrent neural networks like LSTM.data as we would typically do in conventional supervised learning, using recurrent neural networks like LSTM.

\[\begin{equation} D_{\textnormal{meta-train}} = \{(D^{\textnormal{train}}_1, D^{\textnormal{test}}_1), (D^{\textnormal{train}}_2, D^{\textnormal{test}}_2), \cdots, (D^{\textnormal{train}}_n, D^{\textnormal{test}}_n)\} \end{equation}\]

Meta dataset architecture

LSTM Architecture of Meta Learning

Meta-Learning LSTM architecture

The goal of Meta Learning is for the accuracy of the $D^{\textnormal{test}}_{i}$ on the $i$-th task to be high.

\[\begin{equation} \max_{\theta}\sum_{T_{i}}\sum_{(x,y)\sim D^{\textnormal{test}}_i}\log g_{\varphi_i} (x|y)\end{equation}\] \[\begin{equation} \min_{\theta}\sum_{T_i}L(f_\theta(D^{\textnormal{train}}_i), D^{\textnormal{test}}_i \end{equation}\]

Meta-Learning Equation

Learning Flow of the Model-based Meta Learning

Sampling the $T_i$ from the task distribution $p(T)$.
Separate to $D_{i}^{\textnormal{train}}$ and $D_{i}^{\textnormal{test}}$ of the task $T_i$.
Calculate the task parameter $\phi_i$
Update $\theta$ through the $\nabla_{\theta}L(\phi_i, D_i^{\textnormal{test}})$.

NTM (Neural Turing Machines)

As sequences get longer, recurrent neural networks face the problem of gradient vanishing, where past memories are diluted.
The authors proposed a method to address issues like gradient vanishing and memory capacity problems in recurrent neural networks by introducing an addressable and static infinite external memory.

Architecture of Neural Turing Machine.

External Memory

The external memory of NTM is stored in the form of an $N\times M$ matrix, where each column represents a memory location.

$N$ represents the capacity of the external memory.
$M$ represents the dimension of each memory location.

The external memory undergoes slight changes whenever new inputs are received by the heads of the controller.

Memory Read

\[\begin{equation} \sum_i w_t (i) = 1, 0 \leq w_t (i) \leq 1, \forall i \end{equation}\] \[\begin{equation} r_t \leftarrow \sum^{N}_{i=1}w_t (i)\mathbf{M}_t(i) \end{equation}\]

Memory Write

\[\begin{equation} \tilde{M}_t(i) \leftarrow M_{t-1}(i) \odot[\mathbf{1}-w_t(i)e_t] \end{equation}\] \[\begin{equation} M_t(i) \leftarrow \tilde{M}_{t}(i) + w_t (i) a_t \end{equation}\]

Calculate Weight

Content-based Weights

\[\begin{equation} K[k_t, M_t(i)] = \frac{k_t \mathbf{\cdot} \mathbf{M}_t(i)}{\left\| k_t \right\| \mathbf{\cdot} \left\| \mathbf{M}_t (i) \right\|} \end{equation}\] \[\begin{equation} w^c_t(i) \leftarrow \frac{\exp(\beta_t K[k_t, M_t (j)}{\sum^N_{j=1}\exp(\beta_tK[k_t, M_t (j)]} \end{equation}\]

Linear Interpolation

\[\begin{equation} w^g_t \leftarrow g_t w^c_t + (1-g_t)w_{t-1} \end{equation}\]

Rotational Shift

\[\begin{equation} \tilde{w}_t(i) \leftarrow \sum^{N-1}_{j=0}w^g_t (j)s_t (i-j) \end{equation}\]

Sharpening

\[\begin{equation} w_t(i) \leftarrow \frac{\tilde{w}_t(i)^{\gamma_t}}{\sum_j\tilde{w}_t(j)^{\gamma_t}} \end{equation}\]

MANN (Memory-Augmented Neural Networks)

Theory , Meta Reinforcement Learning (Meta-RL)

getting started

This post is licensed under CC BY 4.0 by the author.