Tavakol Aghaei, Vahid
(2019)
*Markov chain monte carlo algorithm for bayesian policy search.*
[Thesis]

PDF

10291483_VahidTavakolAghaei.pdf

Download (11MB)

10291483_VahidTavakolAghaei.pdf

Download (11MB)

## Abstract

The fundamental intention in Reinforcement Learning (RL) is to seek for optimal parameters of a given parameterized policy. Policy search algorithms have paved the way for making the RL suitable for applying to complex dynamical systems, such as robotics domain, where the environment comprised of high-dimensional state and action spaces. Although many policy search techniques are based on the wide spread policy gradient methods, thanks to their appropriateness to such complex environments, their performance might be a ected by slow convergence or local optima complications. The reason for this is due to the urge for computation of the gradient components of the parameterized policy. In this study, we avail a Bayesian approach for policy search problem pertinent to the RL framework, The problem of interest is to control a discrete time Markov decision process (MDP) with continuous state and action spaces. We contribute to the eld by propounding a Particle Markov Chain Monte Carlo (P-MCMC) algorithm as a method of generating samples for the policy parameters from a posterior distribution, instead of performing gradient approximations. To do so, we adopt a prior density over policy parameters and aim for the posterior distribution where the `likelihood' is assumed to be the expected total reward. In terms of risk-sensitive scenarios, where a multiplicative expected total reward is employed to measure the performance of the policy, rather than its cumulative counterpart, our methodology is t for purpose owing to the fact that by utilizing a reward function in a multiplicative form, one can fully take sequential Monte Carlo (SMC), known as the particle lter within the iterations of the P-MCMC. it is worth mentioning that these methods have widely been used in statistical and engineering applications in recent years. Furthermore, in order to deal with the challenging problem of the policy search in large-dimensional state spaces an Adaptive MCMC algorithm will be proposed. This research is organized as follows: In Chapter 1, we commence with a general introduction and motivation to the current work and highlight the topics that are going to be covered. In Chapter 2ö a literature review pursuant to the context of the thesis will be conducted. In Chapter 3, a brief review of some popular policy gradient based RL methods is provided. We proceed with Bayesian inference notion and present Markov Chain Monte Carlo methods in Chapter 4. The original work of the thesis is formulated in this chapter where a novel SMC algorithm for policy search in RL setting is advocated. In order to exhibit the fruitfulness of the proposed algorithm in learning a parameterized policy, numerical simulations are incorporated in Chapter 5. To validate the applicability of the proposed method in real-time it will be implemented on a control problem of a physical setup of a two degree of freedom (2-DoF) robotic manipulator where its corresponding results appear in Chapter 6. Finally, concluding remarks and future work are expressed in chapter 7

Item Type: | Thesis |
---|---|

Uncontrolled Keywords: | Reinforcement Learning. -- Markov Chain Monte Carlo. -- Particle ltering. -- Risk sensitive reward. -- Policy search. -- Control. -- Takviyeli öğrenme. -- Markov zinciri Monte Carlo. -- Parçacık filtre. -- Riske duyarlı ödül. -- Politika araması. -- Kontrol. |

Subjects: | T Technology > TJ Mechanical engineering and machinery > TJ163.12 Mechatronics |

Divisions: | Faculty of Engineering and Natural Sciences > Academic programs > Mechatronics Faculty of Engineering and Natural Sciences |

Depositing User: | IC-Cataloging |

Date Deposited: | 21 Oct 2019 10:12 |

Last Modified: | 26 Apr 2022 10:32 |

URI: | https://research.sabanciuniv.edu/id/eprint/39348 |