Prioritized experince deep deterministic policy gradient method for dynamic systems
Cebeci, Serhat Emre (2019) Prioritized experince deep deterministic policy gradient method for dynamic systems. [Thesis]
In this thesis, the problem of learning to control a dynamic system through reinforcement learning is taken up. There are two important problems in learning to control dynamic systems under this framework: correlated sample space and curse of dimensionality: The first problem means that samples sequentially taken from the plant are correlated, and fail to provide a rich data set to learn from. The second problem means that plants with a large state dimension are untractable if states are quantized for the learning algorithm. Recently, these problems have been attacked by state-of-the-art algorithm called Deep Deterministic Policy Gradient method (DDPG). In this thesis, we propose a new algorithm Prioritized Experience DDPG (PE-DDPG) that improves the sample efficiency of DDPG, through a Prioritized Experience Replay mechanism integrated into the original DDPG. It allows the agent experience some samples more frequently depending on their novelty. PE-DDPG algorithm is tested on OpenAI Gym's Inverted Pendulum task. The results of experiment show that the proposed algorithm can reduce training time and it has lower variance which implies more stable learning process.
Repository Staff Only: item control page