Motivating Reinforcement Learning Agents to Control their Environment

Name
Youssef Sherif Mansour Mohamed
Abstract
Exploration lies at the heart of every Reinforcement Learning problem. Sparse environments rarely reward agents, making them extremely hard to explore. Behavioral biases attempt to solve the problem by intrinsically motivating the agent to exhibit certain behaviors. Understanding the controllable aspects of an environment is a popular behavioral bias implemented using intrinsic motivators. It helped many models to achieve state-of-the-art results. However, current methods rely on inverse dynamics learning to identify controllable aspects. Inverse dynamics learning has drawbacks limiting the agent's ability to model the controllable objects. We highlight some of these drawbacks and propose an alternate approach to learning controllable aspects of the environment.

This thesis introduces Controlled Effects Network (CEN), a semi-supervised method for learning controllable aspects in a Reinforcement Learning environment. CEN uses causal concepts of blame to identify controllable objects. We integrate CEN in an intrinsic motivation module which improves the exploration behavior of reinforcement learning agents. Agents using CEN outperform inverse dynamics agents in both efficiency learning and the max score achieved in Sparse Environments. CEN-based motivator encourages the agent to do more interactions with controllable objects in an environment. Hence, the agent is more likely to reach events that trigger an extrinsic reward from the environment.

We compare agents using CEN-based intrinsic motivators and others using Inverse dynamics-based motivators. To this end, we create multiple sparse environments to test the exploration behavior of both agents. In an empty grid, CEN agents exhibit uniform exploration visiting numerous grid cells, while Inverse agents tend to stick to corners and walls. In sparse Clusters, CEN agents achieve a max score of 5 while Inverse agents manage to get only 1. Moreover, CEN agents learn to solve the Clusters environment more efficiently, requiring fewer environment steps. We open source our implementation of CEN, the sparse environments, and the Never Give Up (NGU) reinforcement learning agent to ease future research on controllability and exploration.
Graduation Thesis language
English
Graduation Thesis type
Master - Computer Science
Supervisor(s)
Oriol Corcoll, Raul Vicente
Defence year
2022
 
PDF