主办:陕西省汽车工程学会
ISSN 1671-7988  CN 61-1394/TH
创刊:1976年

Automobile Applied Technology ›› 2025, Vol. 50 ›› Issue (1): 25-30.DOI: 10.16638/j.cnki.1671-7988.2025.001.005

• Intelligent Connected Vehicle • Previous Articles    

A Behavioral Decision-Making Method Based on Improved Deep Reinforcement Learning Algorithms

JIA Ruihao   

  1. School of Automobile, Chang'an University
  • Published:2025-01-09
  • Contact: JIA Ruihao

基于改进深度强化学习算法的行为决策方法

贾瑞豪   

  1. 长安大学 汽车学院
  • 通讯作者: 贾瑞豪
  • 作者简介:贾瑞豪(1999-),男,硕士研究生,研究方向为载运工具运用工程,E-mail:jia_rh@chd.edu.cn

Abstract: Aiming at the traditional deep reinforcement learning algorithms' problems of simultaneous low driving efficiency, slow convergence and low decision success rate in self-driving decisionmaking tasks due to poor exploration strategies during training, a decision-making method of deep competitive double Q network combined with expert evaluation is proposed. An offline expert model and an online model are proposed, and an adaptive balance factor is introduced between them; a prioritized experience replay mechanism with adaptive importance coefficients is introduced to build an online model on the basis of the competitive deep Q-network; and a reward function that considers driving efficiency, safety, and comfort is designed. The results show that the algorithm improves the convergence speed by 25.93% and 20.00%, the decision success rate by 3.19% and 2.77%, the average steps by 6.40% and 0.14%, and the average speed by 7.46% and 0.42%, respectively, compared with D3QN and PERD3QN.

Key words: autonomous driving; behavioral decision; deep reinforcement learning; imitation learning; improved DQN algorithm

摘要: 针对传统深度强化学习算法因训练时探索策略差导致在自动驾驶决策任务中同时出现 行驶效率低、收敛慢和决策成功率低的问题,提出了结合专家评价的深度竞争双 Q 网络的决 策方法。提出离线专家模型和在线模型,在两者间引入自适应平衡因子;引入自适应重要性 系数的优先经验回放机制在竞争深度 Q 网络的基础上搭建在线模型;设计了考虑行驶效率、 安全性和舒适性的奖励函数。结果表明,该算法相较于 D3QN、PERD3QN 在收敛速度上分别 提高了 25.93%和 20.00%,决策成功率分别提高了 3.19%和 2.77%,平均步数分别降低了 6.40% 和 0.14%,平均车速分别提升了 7.46%与 0.42%。

关键词: 自动驾驶;行为决策;深度强化学习;模仿学习;改进 DQN 算法