本文主要研究内容
作者(2019)在《Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations》一文中研究指出:In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
Abstract
In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.
论文参考文献
论文详细介绍
论文作者分别是来自IEEE/CAA Journal of Automatica Sinica的,发表于刊物IEEE/CAA Journal of Automatica Sinica2019年01期论文,是一篇关于,IEEE/CAA Journal of Automatica Sinica2019年01期论文的文章。本文可供学术参考使用,各位学者可以免费参考阅读下载,文章观点不代表本站观点,资料来自IEEE/CAA Journal of Automatica Sinica2019年01期论文网站,若本站收录的文献无意侵犯了您的著作版权,请联系我们删除。