Prioritized experince deep deterministic policy gradient method for dynamic systems
||The system is temporarily closed to updates for reporting purpose.
Cebeci, Serhat Emre (2019) Prioritized experince deep deterministic policy gradient method for dynamic systems. [Thesis]
Official URL: http://risc01.sabanciuniv.edu/record=b2325810 (Table of contents)
In this thesis, the problem of learning to control a dynamic system through reinforcement learning is taken up. There are two important problems in learning to control dynamic systems under this framework: correlated sample space and curse of dimensionality: The first problem means that samples sequentially taken from the plant are correlated, and fail to provide a rich data set to learn from. The second problem means that plants with a large state dimension are untractable if states are quantized for the learning algorithm. Recently, these problems have been attacked by state-of-the-art algorithm called Deep Deterministic Policy Gradient method (DDPG). In this thesis, we propose a new algorithm Prioritized Experience DDPG (PE-DDPG) that improves the sample efficiency of DDPG, through a Prioritized Experience Replay mechanism integrated into the original DDPG. It allows the agent experience some samples more frequently depending on their novelty. PE-DDPG algorithm is tested on OpenAI Gym's Inverted Pendulum task. The results of experiment show that the proposed algorithm can reduce training time and it has lower variance which implies more stable learning process.
|Uncontrolled Keywords:||Deep reinforcement learning. -- Neural networks. -- Reinforcement learning. -- Dynamic systems. -- Deep learning. -- Derin pekiştirmeli öğrenme. -- Yapay sinir ağları. -- Pekiştirmeli öğrenme. -- Dinamik sistemle. -- Derin öğrenme.|
|Subjects:||T Technology > TJ Mechanical engineering and machinery > TJ163.12 Mechatronics|
|Deposited On:||21 Oct 2019 13:53|
|Last Modified:||21 Oct 2019 13:53|
Repository Staff Only: item control page