Online Optimization Control Algorithm Based on Incremental Q-Learning

doi:10.16638/j.cnki.1671-7988.2023.015.029

Abstract

Abstract: Reinforcement learning (RL) has good application prospects in online optimization of controllers. However, in practical applications, there are serious security risks. To solve this safety hazard, an incremental Q-learning (IQ) algorithm is proposed and applied to online optimization of motor speed synchronization control. IQ divides a round of optimization process in classical Q-learning into multiple continuous optimization processes. Due to the very small allowable change interval limit in each round of optimization, the intelligent agent can safely and stably achieve global optimization. The simulation results show that IQ effectively avoids performance degradation and falls into local optima under strict stop criteria, and is superior to absolute Q-learning (AQ) in terms of optimality and safety.

Key words: Reinforcement learning; Incremental Q-learning; Synchronous control; Online optimization

摘要： 强化学习（RL）在控制器的在线优化中具有很好的应用前景。然而，在实际应用中，却存在严重的安全隐患。为解决这一安全隐患，提出了一种增量 Q 学习（IQ）算法，将其应用于电机转速同步控制的在线优化。IQ 将经典 Q 学习中的一轮优化过程划分为多个连续地优化过程。由于在每轮优化中，将允许的更改间隔限制得非常小，因此智能体能够安全、稳定地达到全局最优。仿真结果表明，IQ 有效地避免了性能衰退和在严苛的停止准则下陷入局部最优的问题，在最优性、安全性方面优于绝对 Q 学习（AQ）。

关键词: 强化学习；增量 Q 学习；同步控制；在线优化

LU Guoqiang. Online Optimization Control Algorithm Based on Incremental Q-Learning[J]. Automobile Applied Technology, 2023, 48(15): 165-171.

卢国强. 基于增量 Q 学习的在线优化控制算法[J]. 汽车实用技术, 2023, 48(15): 165-171.

References

[ 1 ] MOHARAM A,EL-HOSSEINI M A,ALI H A.Design of Optimal PID Controller Using Hybrid Differential Evolution and Particle Swarm Optimization with an Aging Leader and Challengers[J].Applied Soft Computing,2016,38:727-737. [ 2 ] 蒲磊,郑伟光.基于粒子群优化算法的纯电动物流车动力系统参数匹配优化[J].汽车工程师,2023,301(5): 20-25. [ 3 ] 孟艳.基于粒子群优化算法的汽车动力传动参数优化设计[J].微型电脑应用,2022,38(8):119-120,132. [ 4 ] MEENA D C,DEVANSHU A.Genetic Algorithm Tuned PID Controller for Process Control[C]//2017 Lnternational Conference on Lnventive Systems and Control (ICISC).Piscataway:IEEE,2017:1-6. [ 5 ] JAYACHITRA A,VINODHA R.Genetic Algorithm Based PID Controller Tuning Approach for Continuous Stirred Tank Reactor[J].Advances in Artificial Lntelligence,2015,2014:1-8. [ 6 ] RODRíGUEZ-MOLINA A, MEZURA-MONTES E, VILLARREAL-CERVANTES M G, et al. Multiobjective Metaheuristic Optimization in Intelligent Control: A Survey on the Controller Tuning Problem[J].Applied Soft Computing,2020,93:106342. [ 7 ] KILLINGSWORTH N J,KRSTIC M.PID Tuning Using Extremum Seeking:Online,Model-free Performance Optimization[J].IEEE Control Systems Magazine,2006, 26(1):70-79. [ 8 ] MEMON F,SHAO C.AN Optimal Approach to Online Tuning Method for PID Type Iterative Learning Control [J].International Journal of Control,Automation and Systems,2020,18:1926-1935. [ 9 ] KANEKO O.Data-driven Controller Tuning:FRIT Approach[J].IFAC Proceedings Volumes,2013,46(11): 326-336. [10] AHMEID M,Armstrong M,AL-Greer M,et al. Computationally Efficient Self-tuning Controller for DC-DC Switch Mode Power Converters Based on Partial Update Kalman Filter[J].IEEE Transactions on Power Electronics,2017,33(9):8081-8090. [11] HEDRICK E,HEDRICK K,BHATTACHARYYA D,et al.Reinforcement Learning for Online Adaptation of Model Predictive Controllers:Application to A Selective Catalytic Reduction Unit[J].Computers & Chemical Engineering,2022,160:107727. [12] KOFINAS P,DOUNIS A I.Online Tuning of A PID Controller with a Fuzzy Reinforcement Learning MAS for Flow Rate Control of a Desalination Unit[J]. Electronics,2019,8(2):231. [13] LI T,HU W,ZHANG G,et al.Deep Reinforcement Learning-based Approach for Online Tuning SMES Damping Controller Parameters[C]//2020 IEEE International Conference on Applied Superconductivity and Electromagnetic Devices (ASEMD).Piscataway:IEEE, 2020:1-2. [14] MEHNDIRATTA M,CAMCI E,KAYACAN E.Automated Tuning of Nonlinear Model Predictive Controller by Reinforcement Learning[C]//2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS).IEEE,2018:3016-3021. [15] SHIPMAN W J,COETZEE L C.Reinforcement Learning and Deep Neural Networks for PI Controller Tuning[J].IFAC-PapersOnLine,2019,52(14):111-116. [16] 毕翔,黄晃,张本宏,等.基于分簇与改进 Q 学习的车联网 V2V 复合路由算法[J].计算机工程,2023,49(3):221- 230,247. [17] 肖振飞,李金娜.基于非策略 Q 学习方法的两个个体优化控制[J].控制工程,2022,29(10):1874-1880. [18] 钱立军,陈晨,陈健,等.基于 Q 学习模型的无信号交叉口离散车队控制[J].汽车工程,2022,44(9):1350-1358, 1385. [19] 李铮,李金娜.基于强化学习方法的无人机自主避障 [C]//第 33 届中国过程控制会议论文集.北京:中国自动化学会,2022. [20] 王珂,穆朝絮,蔡光斌,等.基于安全自适应强化学习的自主避障控制方法[J].中国科学:信息科学,2022,52 (9):1672-1686. [21] GARCIA J,FERNáNDEZ F.A Comprehensive Survey on Safe Reinforcement Learning[J].Journal of Machine Learning Research,2015,16(1):1437-1480. [22] 李威,张晓东,姜学峰,等.基于改进强化学习的机器人路径规划研究[J].制造业自动化,2023,45(3):148-151, 172. [23] 任伟,朱建鸿.改进的自校正 Q 学习应用于智能机器人路径规划[J/OL].机械科学与技术:1-7[2023-07-31]. DOI:10.13433/j.cnki.1003-8728.20230157. [24] 郭玉帆,沈世全,刘冠颖,等.加权双 Q 学习算法优化的 PHEV 能量管理策略研究[J].重庆理工大学学报(自然科学),2023,37(2):86-96.