1. 程式人生 > >NFQ文獻中 參考文獻的作用

NFQ文獻中 參考文獻的作用

[BM95] Boyan and Moore. Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems 7. Morgan Kaufmann, 1995.

運用多層感知器表示價值函式,所存在的問題

[EPG05] D. Ernst and and L. Wehenkel P. Geurts. Tree-based batch mode reinforcement learning. Journal of Machine Learning Research, 6:503–556, 2005.

NFQ是其中’Fitted Q Iteration’的special realisation

[Gor95] G. J. Gordon. Stable function approximation in dynamic programming. In A. Prieditis and S. Russell, editors, Proceedings of the ICML, San Francisco, CA, 1995.

定值迭代演算法fitted value iteration algorithm,NFQ基於此

[Lin92] L.-J. Lin. Self-improving reactive agents based on reinforcement learning, planning and teaching. Machine Learning, 8:293–321, 1992.

運用多層感知器表示價值函式的成功案例;

’experience replay‘ technique

[LP03] M. Lagoudakis and R. Parr. Least-squares policy iteration. Journal of Machine Learning Research, 4:1107–1149, 2003.

倒立擺(5.1節)所需的樣本,系統方程及引數;LSPI方法及其結果

[RB93] M. Riedmiller and H. Braun. A direct adaptive method for faster backpropagation learning: The RPROP algorithm. In H. Ruspini, editor, Proceedings of the IEEE International Conference on Neural Networks (ICNN), pages 586 – 591, San Francisco, 1993.

Rprop演算法,一種用於批量學習的監督學習方法,訓練Q函式

[Rie00] M. Riedmiller. Concepts and facilities of a neural reinforcement learning control architecture for technical process control. Journal of Neural Computing and Application, 8:323–338, 2000.

運用多層感知器表示價值函式的成功案例

[SB98] R. S. Sutton and A. G. Barto. Reinforcement Learning. MIT Press, Cambridge, MA, 1998.

爬山小車的模型;cartploe模型

[Tes92] G. Tesauro. Practical issues in temporal difference learning. Machine Learning, (8):257–277, 1992.

運用多層感知器表示價值函式的成功案例