：Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations论文

本文主要研究内容

作者（2019）在《Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations》一文中研究指出：In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.

Abstract

In this paper we discuss policy iteration methods for approximate solution of a finite-state discounted Markov decision problem, with a focus on feature-based aggregation methods and their connection with deep reinforcement learning schemes. We introduce features of the states of the original problem, and we formulate a smaller "aggregate" Markov decision problem, whose states relate to the features. We discuss properties and possible implementations of this type of aggregation, including a new approach to approximate policy iteration. In this approach the policy improvement operation combines feature-based aggregation with feature construction using deep neural networks or other calculations. We argue that the cost function of a policy may be approximated much more accurately by the nonlinear function of the features provided by aggregation, than by the linear function of the features provided by neural networkbased reinforcement learning, thereby potentially leading to more effective policy improvement.

论文参考文献

[1].Aggregation Behaviors of a Two-Species System with Lose-Lose Interactions[J]. 宋美霞,林振权,李晓东,柯见洪. Communications in Theoretical Physics.2010(06)

[2].Quality guaranteed aggregation based model predictive control and stability analysis[J]. LI DeWei & XI YuGeng Department of Automation,Shanghai Jiao Tong University,Shanghai 200240,China. Science in China(Series F:Information Sciences).2009(07)

[3].PRELIMINARY STUDY OF PLATELET AGGREGATION MECHANISM USING MATHEMATICAL MODEL[J]. 沈成武,沈迪,王爱莲. Chinese Science Bulletin.1990(02)

[4].STUDY ON 13C NMR SPECTRA OF SUBSTITUTED 2,6-DIOXABICYCLO[3, 1, 1]HEPTANES: 1,3-ANHYDRO-β-L-RHAMNOPYRANOSE AND-β-D-MANNOPYRANOSE ETHERS[J]. 苏邦瑛,孔凡作,魏同太. Chinese Science Bulletin.1989(04)

[5].SYSTEM AGGREGATION METHOD FOR FAILURE PRONE PRODUCTION LINES WITH UNRELIABLE LIMITED BUFFERS[J]. LIU Jun RUI Zhiyuan ZHAO Juntian WEI Yaobing Key Laboratory of Digital Manufacturing Technology and Application,Lanzhou University of Technology,Lanzhou 730050,China. Chinese Journal of Mechanical Engineering.2008(02)

[6].A NEW KIND OF SURFACTANT: LONG-CHAINALKYLTRIPHENYLPHOSPHONIUM——FLUORESCENCE SPECTROSCOPIC STUDY ON THE MICELLIZATION BEHAVIOR OF DODECYLTRIPHENYLPHOSPHONIUM BROMIDE IN AQUEOUS SOLUTION[J]. 江云宝,许金钩,陈国珍. Chinese Science Bulletin.1991(06)

[7].Multi-criteria classification approach with polynomial aggregation function and incomplete certain information[J]. Wang Jianqiang School of Business, Central South University ,Changsha 410083, P. R. China. Journal of Systems Engineering and Electronics.2006(03)

[8].Solving frictional contact problems by two aggregate-function-based algorithms[J]. S.Y.He H.W.Zhang X.S.Li Department of Engineering Mechanics,State Key Laboratory of Structural Analysis of Industrial Equipment,Dalian University of Technology,Dalian 116024,ChinaS.Y.He School of Software,Dalian University of Foreign Languages,Dalian 116002,China. Acta Mechanica Sinica.2005(05)

[9].THE PHYSICAL CONCEPT OF FINENESS MODULUS OF CONCRETE AGGREGATE[J]. 谭炳训. Science in China,Ser.A.1958(03)

[10].AN ADAPTIVE-WEIGHTED TWO-DIMENSIONAL DATA AGGREGATION ALGORITHM FOR CLUSTERED WIRELESS SENSOR NETWORKS[J]. Zhang Junhu,Zhu Xiujuan,Peng Hui. Journal of Electronics(China).2013(06)

论文详细介绍

论文作者分别是来自IEEE/CAA Journal of Automatica Sinica的，发表于刊物IEEE/CAA Journal of Automatica Sinica2019年01期论文，是一篇关于,IEEE/CAA Journal of Automatica Sinica2019年01期论文的文章。本文可供学术参考使用，各位学者可以免费参考阅读下载，文章观点不代表本站观点，资料来自IEEE/CAA Journal of Automatica Sinica2019年01期论文网站，若本站收录的文献无意侵犯了您的著作版权，请联系我们删除。

标签：IEEE; CAA Journal of Automatica Sinica2019年01期论文;

：Feature-Based Aggregation and Deep Reinforcement Learning:A Survey and Some New Implementations论文

本文主要研究内容

Abstract

论文参考文献

论文详细介绍

猜你喜欢