GOME-NGU: visual navigation under sparse reward via Goal Oriented Memory Encoder with Never Give Up

Abstract

In this paper, we propose the Goal Oriented Memory Encoder (GOME) with Never-Give-Up (NGU) model to enhance visual navigation in sparse rewards environments. Our approach addresses the critical need for efficient exploration and exploitation under high-dimensional visual data and limited spatial information. In our proposed GOME-NGU, we first utilize NGU for sufficient exploration. Then GOME is applied to effectively leverage the information acquired during the exploration phase for exploitation. In GOME, the information of goals and their associated rewards, obtained during the exploration phase, are stored as pairs. During the exploitation phase, priorities are assigned to the stored goals based on the agent's current location, arranging them in order of proximity to enable the agent acting optimally. To demonstrate the efficiency of the proposed GOME-NGU, we consider two different experiment categories. Specifically, (i) to validate the sufficiency of exploration using NGU, we measured the environment's state and recorded the number of times the agent visited each state, and (ii) to confirm that GOME optimizes the path to goals nearest to the agent's location during the exploitation phase, we measured the frequency of the agent consecutively reaching nearby goals at least twice. The training for (i) and (ii) was conducted using NVIDIA Isaac Gym, and post-training validation was carried out through migration to Isaac Sim. Additionally, for (ii), the efficiency of the proposed GOME-NGU was validated using the Husky robot in the real-world setting. In the experimental results, our proposed GOME-NGU demonstrated the enhanced performance in both exploration and exploitation aspects in environments with sparse rewards.

Method

Save the goal image and goal reward

Within our goal-oriented network, we input the goal image and generate the goal reward $r^g_t$. In the GOME, each pair of $(g_t, r^g_t)$ is stored. The value of $r^g_t$ changes according to the proximity of the goal based on the location of the agent. If the agent takes a specific action (such as reaching a goal) during exploration, all $r^g_t$ are set to 1.

Goal image feature extraction

Through feature extraction, we provide meaningful information from the goal image. We utilize ResNet-50 as the baseline for this process.

Priority goal selection based on the location of the agent

Next, we select a priority goal based on the location of the agent. This involves assessing the similarity between the current and goal images in the GOME. If the $g_n$ is close to the agent, it is selected as the priority goal. The proximity of the goal to the agent corresponds to the similarity score. A high score indicates that the goal is close to the agent. We utilze the cosine similarity to measure this similarity, which is calculated as follows: $$ \begin{align*} (v_t^O, v_t^G) = \frac{v_t^O \cdot v_t^G}{\|v_t^O\| \times \|v_t^G\|} \end{align*} $$ where $v^O_t$ is the feature vector of the current image and $v^G_t$ is the feature vector of the goal image in GOME. The norm of vector, $v$ is represented as $\|v\|$.

Goal reward re-parameterization and GOME update

Once the priority goal is selected, we re-parameterize our goal reward to prioritize the targets near the agent. This re-parameterization scales the similarity score to the goal. The priority goal is selected, and we re-parameterize $r^g_t$ according to the following formula to facilitate the learning of the agent to visit closer goals first: $$ \begin{align} r^g_t = \frac{1}{(v^O_t, v^G_t)^\xi}, \end{align} $$ where $\xi$ is a parameter that adjusts the influence of distance. Using the formula above, goals that are farther from the agent will receive smaller rewards in the given episode.

Experiment

Experiment 01 - State Visitation Count

We discretized states in a simple environment (empty warehouse) and measured the number of times each algorithm visited a state for three exploration algorithms.

In the case of ICM and RND, the problem of occasionally vanishing when revisiting states prevented sufficient exploration, but NGU was able to perform sufficient exploration.

ICM

RND

NGU

We compared exploration algorithms in a complex environment (hospital) and validated that NGU enables long-horizontal exploration.

The experimental results showed that ICM and RND were unable to perform exploration to deep locations, whereas NGU demonstrated the capability for long-horizontal exploration.

Experiment 02 - Sequential Goal Visitation Count

We conducted validation on the proposed GOME-NGU in a complex environment (hospital).

ICM and RND were unable to perform deep exploration, but NGU enabled long-horizontal exploration. In terms of sequential goal count, neither ICM, RND, nor NGU demonstrated good performance individually. However, when combined with GOME, both GOME-ICM, GOME-RND, and GOME-NGU exhibited strong performance. Among these, the proposed GOME-NGU achieved the highest performance.

Video

BibTeX

@unpublished{gome-ngu,
	author = {Ji Sue Lee and Jun Moon},
	title  = {{GOME-NGU: visual navigation under sparse reward via Goal-Oriented Memory Encoder with Never Give Up}},
	year   = {2025},
}