Jakub Łyskawa
supervisor: Paweł Wawrzyński
In this presentation I describe the work I did, do and am going to do as a PhD student. I focus on the research on improving the Actor-Critic with Experience Replay Reinforcement Learning algorithm.
I start with a short introduction to Reinforcement Learning and overview of the Actor-Critic with Experience Replay algorithm. Then I present 4 primary research directions.
The first is the introduction of autocorrelated actions as a mean to structure exploration and allow reinforcement learning to be used in fine discretization settings.
The second topic focuses on the research on prioritized experience replay. Current methods are designed for action-value function-based algorithms and seem to be unfit for value function-based algorithms. I present the results of ongoing research on a solution that may be able to improve the sample efficiency of value function-based algorithms.
I also present two topics to be investigated in future research. One of them is the utilisation of quantile regression to estimate the distribution of the future rewards. The other one is the adaptation of time discretization that will allow algorithm to start by learning basic task first and to improve the policy later on finer discretization.