{"id":1229,"date":"2021-09-23T14:26:49","date_gmt":"2021-09-23T13:26:49","guid":{"rendered":"https:\/\/babel.isa.uma.es\/kipr\/?p=1229"},"modified":"2021-09-23T14:27:01","modified_gmt":"2021-09-23T13:27:01","slug":"bayesian-estimation-of-the-model-in-model-based-rl-for-robots","status":"publish","type":"post","link":"https:\/\/babel.isa.uma.es\/kipr\/?p=1229","title":{"rendered":"Bayesian estimation of the model in model-based RL for robots"},"content":{"rendered":"<h4>Senda, Kei,  Hishinuma, Toru, Tani, Yurika, <strong>Approximate Bayesian reinforcement learning based on estimation of plant,<\/strong> Autonomous Robots 44(5), <a href=\"https:\/\/doi.org\/10.1007\/s10514-020-09901-4\" target=\"_blank\">DOI: 10.1007\/s10514-020-09901-4<\/a>.<\/h4>\n<blockquote><p>This study proposes an approximate parametric model-based Bayesian reinforcement learning approach for robots, based on online Bayesian estimation and online planning for an estimated model. The proposed approach is designed to learn a robotic task with a few real-world samples and to be robust against model uncertainty, within feasible computational resources. The proposed approach employs two-stage modeling, which is composed of (1) a parametric differential equation model with a few parameters based on prior knowledge such as equations of motion, and (2) a parametric model that interpolates a finite number of transition probability models for online estimation and planning. The proposed approach modifies the online Bayesian estimation to be robust against approximation errors of the parametric model to a real plant. The policy planned for the interpolating model is proven to have a form of theoretical robustness. Numerical simulation and hardware experiments of a planar peg-in-hole task demonstrate the effectiveness of the proposed approach.<\/p><\/blockquote>\n","protected":false},"excerpt":{"rendered":"<p>Senda, Kei, Hishinuma, Toru, Tani, Yurika, Approximate Bayesian reinforcement learning based on estimation of plant, Autonomous Robots 44(5), DOI: 10.1007\/s10514-020-09901-4. <span class=\"ellipsis\">&hellip;<\/span> <span class=\"more-link-wrap\"><a href=\"https:\/\/babel.isa.uma.es\/kipr\/?p=1229\" class=\"more-link\"><span>Read More &rarr;<\/span><\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[14],"tags":[29,432,205],"class_list":["post-1229","post","type-post","status-publish","format-standard","hentry","category-applications-of-reinforcement-learning-to-robots","tag-bayesian-estimation","tag-model-learning","tag-model-based-reinforcement-learning"],"_links":{"self":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1229"}],"collection":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=1229"}],"version-history":[{"count":1,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1229\/revisions"}],"predecessor-version":[{"id":1230,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=\/wp\/v2\/posts\/1229\/revisions\/1230"}],"wp:attachment":[{"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=1229"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=1229"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babel.isa.uma.es\/kipr\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=1229"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}