Augmenting Small Sample Dataset for Autonomous Driving Based 
on Generative Models

doi:10.16638/j.cnki.1671-7988.2025.021.004

Automobile Applied Technology ›› 2025, Vol. 50 ›› Issue (21): 21-25,37.DOI: 10.16638/j.cnki.1671-7988.2025.021.004

• Intelligent Connected Vehicle • Previous Articles

Augmenting Small Sample Dataset for Autonomous Driving Based on Generative Models

HUANG Qiusheng

Chery Automobile Company Limited

Published:2025-11-04
Contact: HUANG Qiusheng

基于生成式模型增强自动驾驶小样本数据集

黄秋生

奇瑞汽车股份有限公司

通讯作者: 黄秋生
作者简介:黄秋生（1986－），男，硕士，高级工程师，研究方向为新能源汽车和智能网联汽车

Abstract

Abstract: Training autonomous driving algorithms based on deep learning faces challenges such as a limited amount of data and uneven data distribution. Traditional methods achieve data augmentation through image flipping, cropping and color transformation. However, these struggle to address the issue of uneven data distribution. Some approaches propose conducting model training on small sample data through complex parameter tuning and transfer learning. Nevertheless, often requires a significant amount of time for hyperparameter adjustment and is prone to overfitting problems. In this study, a generative adversarial network (GAN) is trained on a small sample dataset, and the generator is utilized to generate new sample data. The generated samples maintain the same style as the original sample data while expanding the number of samples. Additionally, through prompt engineering and diffusion models, the distribution of samples is further optimized. The newly added sample data and the original sample data are integrated together using an automatic annotation method for training the object detection network. The performance is evaluated separately on object detection models based on different backbone networks. The experimental results demonstrate that a YOLO model is trained on a small sample dataset. In the practical task of vehicle object recognition, the detection success rate is increased to 93%.

Key words: small sample dataset; autonomous driving; data augmentation; GAN; diffusion model

摘要： 基于深度学习训练自动驾驶算法面临数据少、分布不均匀等挑战，传统方法通过图像的翻转、裁剪和色彩变换实现数据增强，但这难以解决数据分布不均匀问题。一些方法提出通过复杂的调参和迁移学习，在小样本数据上进行模型训练，但往往需要较长的时间进行超参数调节，且易产生过拟合问题。研究在小样本数据集上训练一个生成式对抗网络（GAN），并利用生成器生成新的样本数据，生成的样本和原始样本数据保持一致的风格，但扩充了样本的数量。此外，通过提示词工程和扩散模型，进一步优化样本的分布，通过自动标注方法将新增的样本数据和原始样本数据融合在一起，训练目标检测网络，并在基于不同主干网络的目标检测模型上分别评估了性能。实验结果表明，基于小样本的数据集上训练出了 YOLO 模型，在针对车辆的目标识别的实际任务中，检测成功率提升到 93%。

关键词: 小样本数据集；自动驾驶；数据增强；GAN；扩散模型

HUANG Qiusheng. Augmenting Small Sample Dataset for Autonomous Driving Based on Generative Models[J]. Automobile Applied Technology, 2025, 50(21): 21-25,37.

黄秋生. 基于生成式模型增强自动驾驶小样本数据集[J]. 汽车实用技术, 2025, 50(21): 21-25,37.

Augmenting Small Sample Dataset for Autonomous Driving Based on Generative Models

基于生成式模型增强自动驾驶小样本数据集

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics