Michał Bortkiewicz
supervisor: Tomasz Trzciński
Hierarchical division of control and sequential decision-making is unavoidable in large systems. In reinforcement learning, it is usually introduced with handcrafted subtasks to learn or with learned inflexible subgoals indicated within the original state space. In this paper, we reconsider assumptions that underlie those approaches and propose a new set of assumptions: 1) Goals of lower-level control are defined by projections returned by high-level controller, 2) higher-level control constantly verifies if lower-level goals are still valid, 2) if there is a better lower-level goal to pursue, our agent switches the target for lower policy even if the previous was not achieved. Consequently, we propose a novel hierarchical reinforcement learning algorithm that solves navigation tasks in dynamic environments based on these assumptions that lead to fast learning typical for hierarchical architectures. They also lead to smooth and dexterous control but above all, enable a fast reaction to unanticipated changes in the environment that exhibit dynamic situations for which the agent is not prepared. Also, lower-level policies considered within our approach are versatile enough to be useful in different tasks. This creates a natural framework for the transfer of knowledge in reinforcement learning.