主办:陕西省汽车工程学会
ISSN 1671-7988  CN 61-1394/TH
创刊:1976年

Automobile Applied Technology ›› 2025, Vol. 50 ›› Issue (21): 21-25,37.DOI: 10.16638/j.cnki.1671-7988.2025.021.004

• Intelligent Connected Vehicle • Previous Articles    

Augmenting Small Sample Dataset for Autonomous Driving Based on Generative Models

HUANG Qiusheng   

  1. Chery Automobile Company Limited
  • Published:2025-11-04
  • Contact: HUANG Qiusheng

基于生成式模型增强自动驾驶小样本数据集

黄秋生   

  1. 奇瑞汽车股份有限公司
  • 通讯作者: 黄秋生
  • 作者简介:黄秋生(1986-),男,硕士,高级工程师,研究方向为新能源汽车和智能网联汽车

Abstract: Training autonomous driving algorithms based on deep learning faces challenges such as a limited amount of data and uneven data distribution. Traditional methods achieve data augmentation through image flipping, cropping and color transformation. However, these struggle to address the issue of uneven data distribution. Some approaches propose conducting model training on small sample data through complex parameter tuning and transfer learning. Nevertheless, often requires a significant amount of time for hyperparameter adjustment and is prone to overfitting problems. In this study, a generative adversarial network (GAN) is trained on a small sample dataset, and the generator is utilized to generate new sample data. The generated samples maintain the same style as the original sample data while expanding the number of samples. Additionally, through prompt engineering and diffusion models, the distribution of samples is further optimized. The newly added sample data and the original sample data are integrated together using an automatic annotation method for training the object detection network. The performance is evaluated separately on object detection models based on different backbone networks. The experimental results demonstrate that a YOLO model is trained on a small sample dataset. In the practical task of vehicle object recognition, the detection success rate is increased to 93%.

Key words: small sample dataset; autonomous driving; data augmentation; GAN; diffusion model

摘要: 基于深度学习训练自动驾驶算法面临数据少、分布不均匀等挑战,传统方法通过图像 的翻转、裁剪和色彩变换实现数据增强,但这难以解决数据分布不均匀问题。一些方法提出 通过复杂的调参和迁移学习,在小样本数据上进行模型训练,但往往需要较长的时间进行超 参数调节,且易产生过拟合问题。研究在小样本数据集上训练一个生成式对抗网络(GAN), 并利用生成器生成新的样本数据,生成的样本和原始样本数据保持一致的风格,但扩充了样 本的数量。此外,通过提示词工程和扩散模型,进一步优化样本的分布,通过自动标注方法 将新增的样本数据和原始样本数据融合在一起,训练目标检测网络,并在基于不同主干网络 的目标检测模型上分别评估了性能。实验结果表明,基于小样本的数据集上训练出了 YOLO 模型,在针对车辆的目标识别的实际任务中,检测成功率提升到 93%。

关键词: 小样本数据集;自动驾驶;数据增强;GAN;扩散模型