Improving the Actor-Critic with Experience Replay

Jakub Łyskawa

supervisor: Paweł Wawrzyński



Actor-Critic with Experience Replay (ACER) is a Reinforcement Learning algorithm based on the Actor-Critic algorithm. It was adapted to be used with deep neural networks and to increase its sample efficiency by storing and replaying collected experiences. In my work, I research possibilities to improve the quality of this algorithm, both in general and with a focus on robotic control.


In my previous work in collaboration with other researchers a novel algorithm was introduced - the Actor-Critic with Experience Replay and Autocorrelated Actions. It is based on the ACER algorithm and utilizes autocorrelation to improve the smoothness of the control signal without decreasing exploration intensity. It yields better results in robotic control environments than both ACER and selected state-of-the-art algorithms. Its relative performance is even higher when applied to environments with finer time discretization.


In another work, I explore the possibilities of improving the performance of the ACER algorithm by increasing the informativeness of the estimation of future rewards. The method that I applied is quantile regression. It improves the stability of learning and has a lot of potential for further improvements as, instead of just the expected value estimated in the most common approach, it allows to estimate the whole distribution.