This thesis builds on the deep learning techniques that have recently achieved multiple successes for problems as diverse as the type of data used (images, text, DNA, etc.) in large-scale settings (i.e. with large amount of high-dimensional data) thanks to the computational improvements. Reinforcement learning (RL) has leveraged these successes to continuous control tasks in environments that are only partially known, i.e. without knowledge of the transition model, and recently without knowledge of the underlying compact state space that Markov decision processes (MDPs) usually provide. There are many improvements in this last class of methods called end-to-end deep RL, but they still remain computationally and memory intensive.
In order to facilitate the applicability of RL algorithms to these large-scale problems, this thesis builds on the burgeoning field of state representation learning (SRL). From a global perspective, it takes advantage of the popularity for deep unsupervised pretraining of input representations that offer improvements over input embeddings learned from scratch. Its solution exploits the accumulated experience of agents in the manner of transfer learning, i.e. where automatic prediction for new tasks is performed on examples from a different training distribution.
In this research, we have proposed two new SRL algorithms that address the general question of how to learn state embeddings that improve the performance of RL algorithms (i.e. a better convergence of the optimal policy in terms of computational cost, sampling efficiency and final performance) in large-scale settings and without access to the reward that is generally available. Each of these algorithms respectively answers the question (i) How to represent states so that an agent can perform tasks by imitation learning? (ii) How to represent states so that an agent can predict its near future (the next state and the next observation) by exploring the estimated uncertain unknown transitions?
Finally, this work is a way to improve all types of inputs for RL algorithms (not just image observations) through state representations which verify good properties for the efficiency of bootstrapping and other automatic decision making mechanisms also common to supervised learning.