主办:陕西省汽车工程学会
ISSN 1671-7988  CN 61-1394/TH
创刊:1976年

汽车实用技术 ›› 2025, Vol. 50 ›› Issue (15): 48-55.DOI: 10.16638/j.cnki.1671-7988.2025.015.009

• 智能网联汽车 • 上一篇    

基于最大熵逆强化学习的个性化驾驶行为 建模与轨迹规划

吴凤   

  1. 西华大学 汽车测控与安全四川省重点实验室
  • 发布日期:2025-08-08
  • 通讯作者: 吴凤
  • 作者简介:吴凤(1995-),女,硕士研究生,研究方向为自动驾驶决策和规划相关领域

Personalized Driving Behavior Modeling and Trajectory Planning Based on Maximum Entropy Inverse Reinforcement Learning

WU Feng   

  1. Sichuan Key Laboratory of Vehicle Measurement, Control and Safety, Xihua University
  • Published:2025-08-08
  • Contact: WU Feng

摘要: 为缩小自动驾驶汽车在决策与规划行为上与人类驾驶员的差异,文章提出了一种基于 最大熵逆强化学习(MaxEnt IRL)的个性化决策与规划方法。该方法首先通过卷积-池化长短 期记忆网络(LSTM)捕捉周围车辆的交互关系来预测周车轨迹。其次,在驾驶行为建模过程 中,将人类的连续行为离散化以减少 MaxEnt IRL 的计算复杂度,并采用个性化回报函数来解 释不同驾驶员的偏好与决策过程。然后使用五次多项式规划方法求解轨迹,最后在仿真环境 中验证了文章提出方法的有效性。实验结果表明,与通用的最大熵逆强化学习算法相比,文 章所提出的个性化算法显著减少了规划轨迹与专家轨迹之间的误差,均方根误差(RMSE)从 3 m 降低至 0.8 m,并显著提升了车内乘员的乘坐舒适性。

关键词: 拟人化决策;驾驶行为建模;逆强化学习;轨迹预测;轨迹规划

Abstract: To bridge the gap between decision-making and planning behaviors of autonomous vehicles and human drivers, this paper proposes a personalized decision-making and planning approach based on maximum entropy inverse reinforcement learning (MaxEnt IRL). First, a convolutional-pooling long short-term memory (LSTM) neural network model is used to capture the interaction relationships of surrounding vehicles and predict their trajectories. Second, during the modeling of driving behavior, continuous human behaviors are discretized to reduce the computational complexity of MaxEnt IRL, and a personalized reward function is introduced to reflect the preferences and decision-making processes of individual drivers. Subsequently, a quintic polynomial planning method is employed to solve the trajectory. Finally, the effectiveness of the proposed method is validated in a simulation environment. Experimental results demonstrate that, compared with conventional IRL algorithms, the proposed personalized MaxEnt IRL algorithm significantly reduces the error between planned trajectories and expert trajectories, with the root mean squared error (RMSE) decreasing from 3 m to 0.8 m, while notably enhancing passenger ride comfort.

Key words: human-like decision-making; driving behavior modeling; inverse reinforcement learning; trajectory predicting; trajectory planning