改进深度确定性策略梯度的决策算法研究

doi:10.16638/j.cnki.1671-7988.2022.001.007

汽车实用技术 ›› 2022, Vol. 47 ›› Issue (1): 28-31.DOI: 10.16638/j.cnki.1671-7988.2022.001.007

• 智能网联汽车 • 上一篇

改进深度确定性策略梯度的决策算法研究

陈建文，张小俊，张明路

河北工业大学机械工程学院

出版日期:2022-01-15 发布日期:2022-01-15
通讯作者: 陈建文
作者简介:陈建文，硕士研究生，河北工业大学机械工程学院，研究方向：路径规划。

Research on Improved Decision Algorithm of Deep Deterministic Policy Gradient

CHEN Jianwen, ZHANG Xiaojun, ZHANG Minglu

School of Mechanical Engineering, Hebei University of Technology

Online:2022-01-15 Published:2022-01-15
Contact: CHEN Jianwen

摘要/Abstract

摘要： 为解决无人驾驶路径规划过程中的决策控制问题，文章针对深度确定性策略梯度算法在未知环境中随着搜索空间的增大，出现训练效率低、收敛不稳定等缺点，提出了基于奖励指导的改进算法。首先在每回合内采用基于奖励的优先级经验回放，减少深度确定性策略梯度算法随机探索的盲目性，提高智能车学习效率。然后在回合间基于奖励筛选优秀轨迹，便于指导智能车对复杂空间的探索，得到稳定的控制策略。最后，在开源智能驾驶仿真环境进行仿真。实验结果表明改进后的深度确定性策略梯度算法性能优于原来的算法，训练效率和收敛稳定性均得到有效提升。

关键词: 路径规划；决策控制；深度确定性策略梯度；奖励指导；优先经验回放

Abstract: In order to solve the problem of decision-making control in the process of unmanned path planning, in view of the deep deterministic policy gradient algorithm, there are defects such as low training efficiency and unstable convergence, with the increase of search space in unknown environments. An improved algorithm based on reward guidance is proposed. Firstly, prioritized experience replay based on reward is adopted in each round to reduce the blindness of random exploration of the deep deterministic policy gradient algorithm and improve the learning efficiency of the intelligent vehicle. Then, the excellent trajectory is selected based on reward between rounds to guide the intelligent vehicle to explore complex space and obtain a stable control strategy. Finally, the simulation is carried out in the open-source intelligent driving simulation environment. The experimental results show that the performance of the improved deep deterministic policy gradient algorithm is better than the original algorithm, and the training efficiency and convergence stability are effectively improved.

Key words: Path planning; Decision control; Deep deterministic policy gradient; Reward guidance; Prioritized experience replay

陈建文. 改进深度确定性策略梯度的决策算法研究[J]. 汽车实用技术, 2022, 47(1): 28-31.

CHEN Jianwen. Research on Improved Decision Algorithm of Deep Deterministic Policy Gradient[J]. Automobile Applied Technology, 2022, 47(1): 28-31.

参考文献

[ 1 ] Sallab A E L,Abdou M,Perot E,et al.Deep reinforcement learning framework for autonomous driving[J].Electronic Imaging, 2017(19):70-76. [ 2 ] Li Y.Deep reinforcement learning:An Overview[J].arXiv, 2017,arXiv:1701.07274. [ 3 ] Silver D,Huang A,Maddison C J,et al.Mastering the game of Go with deep neural networks and tree search[J].Nature, 2016, 529(7587):484-489. [ 4 ] Xiong X,Wang J,Zhang F,et al.Combining deep reinforcement learning and safety based control for autonomous driving[J]. arXiv,2016,arXiv:1612.00147. [ 5 ] Yang F,Wang P,Wang X H.Continuous control in car simulator with deep reinforcement learning[C]//Proceedings of the 2018 2nd International Conference on Computer Science and Artificial Intelligence. 2018: 566-570. [ 6 ] Liu Y,Zhang W,Chen F,et al.Path planning based on improved deep deterministic policy gradient algorithm[C]//2019 IEEE 3rd Information Technology, Networking, Electronic and Automation Control Conference (ITNEC).IEEE,2019: 295-299. [ 7 ] Zong X,Xu G,Yu G,et al.Obstacle avoidance for self-driving vehicle with reinforcement learning[J]. SAE International Journal of Passenger Cars-Electronic and Electrical Systems, 2017, 11(07-11-01-0003): 30-39. [ 8 ] 刘全,翟建伟,章宗长,等.深度强化学习综述[J].计算机学报, 2018,41(1): 1-27. [ 9 ] Wang Z, Bapst V, Heess N, et al. Sample efficient actor-critic with experience replay[J]. arXiv,2016 arXiv:1611.01224. [10] Silver D,Lever G,Heess N,et al.Deterministic policy gradient algorithms[C]//Proceedings of the 31st International Conference on Machine Learning(ICML-14).New York,USA:ACM Press,2014:387-395. [11] Mnih V,Kavukcuoglu K,Silver D,et al.Human-level control through deep reinforcement learning[J].nature,2015,518(7540): 529-533. [12] Wymann B,Espié E,Guionneau C,et al.Torcs,the open racing car simulator[J].Software available at http://torcs. sourceforge.net,2000,4(6):2-6.

改进深度确定性策略梯度的决策算法研究

Research on Improved Decision Algorithm of Deep Deterministic Policy Gradient

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 0

编辑推荐

Metrics