主办:陕西省汽车工程学会
ISSN 1671-7988  CN 61-1394/TH
创刊:1976年

汽车实用技术 ›› 2022, Vol. 47 ›› Issue (1): 28-31.DOI: 10.16638/j.cnki.1671-7988.2022.001.007

• 智能网联汽车 • 上一篇    

改进深度确定性策略梯度的决策算法研究

陈建文,张小俊,张明路   

  1. 河北工业大学 机械工程学院
  • 出版日期:2022-01-15 发布日期:2022-01-15
  • 通讯作者: 陈建文
  • 作者简介:陈建文,硕士研究生,河北工业大学机械工程学院,研究方向:路径规划。

Research on Improved Decision Algorithm of Deep Deterministic Policy Gradient

CHEN Jianwen, ZHANG Xiaojun, ZHANG Minglu   

  1. School of Mechanical Engineering, Hebei University of Technology
  • Online:2022-01-15 Published:2022-01-15
  • Contact: CHEN Jianwen

摘要: 为解决无人驾驶路径规划过程中的决策控制问题,文章针对深度确定性策略梯度算法在未 知环境中随着搜索空间的增大,出现训练效率低、收敛不稳定等缺点,提出了基于奖励指导的改 进算法。首先在每回合内采用基于奖励的优先级经验回放,减少深度确定性策略梯度算法随机探 索的盲目性,提高智能车学习效率。然后在回合间基于奖励筛选优秀轨迹,便于指导智能车对复 杂空间的探索,得到稳定的控制策略。最后,在开源智能驾驶仿真环境进行仿真。实验结果表明 改进后的深度确定性策略梯度算法性能优于原来的算法,训练效率和收敛稳定性均得到有效提升。

关键词: 路径规划;决策控制;深度确定性策略梯度;奖励指导;优先经验回放

Abstract: In order to solve the problem of decision-making control in the process of unmanned path planning, in view of the deep deterministic policy gradient algorithm, there are defects such as low training efficiency and unstable convergence, with the increase of search space in unknown environments. An improved algorithm based on reward guidance is proposed. Firstly, prioritized experience replay based on reward is adopted in each round to reduce the blindness of random exploration of the deep deterministic policy gradient algorithm and improve the learning efficiency of the intelligent vehicle. Then, the excellent trajectory is selected based on reward between rounds to guide the intelligent vehicle to explore complex space and obtain a stable control strategy. Finally, the simulation is carried out in the open-source intelligent driving simulation environment. The experimental results show that the performance of the improved deep deterministic policy gradient algorithm is better than the original algorithm, and the training efficiency and convergence stability are effectively improved.

Key words: Path planning; Decision control; Deep deterministic policy gradient; Reward guidance; Prioritized experience replay