基于双层DQN的多智能体路径规划
摘 要
目的 随着虚拟现实技术的发展,在虚拟场景中,基于多智能体的逃生路径规划已成为关键技术之一。与传统的火灾演习相比,采用基于虚拟现实的方法完成火灾逃生演练具有诸多优势,如成本低、代价小、可靠性高等,但仍有一定的局限性,为此,提出一种改进的双层深度Q网络(deep Q network,DQN)架构的路径规划算法。方法 基于两个结构相同的双Q网络,优化了经验池的生成方法和探索策略,并在奖励中增加火灾这样的环境因素对智能体的影响。同时,为了提高疏散的安全性和效率,提出了一种基于改进的K-medoids算法的多智能体分组策略方法。结果 相关实验表明提出的改进的双层深度Q网络架构收敛速度更快,学习更加稳定,模型性能得到有效提升。综合考虑火灾场景下智能体的疏散效率和疏散安全性,使用指标平均健康疏散值(average health evacuationvalue,AHEP)评估疏散效果,相较于传统的路径规划方法A-STAR (a star search algorithm)和DIJKSTRA (Dijkstra’ salgorithm)分别提高了84%和104%;与基于火灾场景改进的扩展A-STAR和Dijkstra-ACO (Dijkstra and ant colonyoptimization)混合算法比较,分别提高了30%和21%;与考虑火灾影响的DQN算法相比,提高了20%,疏散效率和安全性都得到提高,规划的路径疏散效果更好。通过比较不同分组模式下的疏散效果,验证了对多智能体合适分组可以提高智能体疏散效率。结论 提出的算法优于目前大多数常用的方法,显著提高了疏散的效率和安全性。
关键词
Multi-agent path planning based on improved double DQN
Zhang Chen1, Jiang Wenying1, Chen Siyuan1, Zhou Wen1, Yan Fengting2(1.School of Computer and Information, Anhui Normal University, Wuhu 241000, China;2.School of Electronic and Electrical Engineering, Shanghai University of Engineering and Technology, Shanghai 201620, China) Abstract
Objective Rescue-oriented evacuation drills like fire escape drills have often been structured to optimize rehearsal training effect and firefighting awareness. To get sufficient evacuation experience,multiple drills are costly for related organizers. The requirement of that is based on evacuation drills,emergency drill venue,the physical condition of participants,and position information in real-time. The emerging virtual reality technology can be used to guide virtual fire escape in relevance to lower cost and risk and higher reliability. Moreover,to simulate its emergency drills in virtual scenarios,multi-agent path planning has been recognized and developed nowadays. Method We develop an improved double deep Q network(DQN)framework. Specifically,this virtual scenario analysis is developed through collecting enough campus information,including multiple agents,obstacles,exits,fire affected areas,and other related factors. Since all agents are assumed on the same plane,we can convert them into two-dimensional grid diagrams via transformation gridding and coordination. Furthermore,different grids are colored and utilized in two-dimensional grid plane m to represent obstacles, fire affected areas,exits and locations of agents. According to the location of the agent in the virtual scene,the grid plane m is layered,and the grid plane m1 and the grid plane m2 can be obtained in terms of the sizes of 64×100 and 48×100 of each. In the double deep Q network,we use two double Q networks with the same structure,i. e. ,Q1 and Q2,which consists of two category of convolution and full connection layers. Furthermore,input size can be interlinked to the grid planes with the same size as m1 and m2 after environmental stratification. For the grid planes with the same size as m1 and m2, trainable grid planes m'1t and m'2t can be obtained by randomly assigning the same number of black blocks with size of 1×1 to represent the duplicable location of the obstacle,and generating planes corresponding to all different starting positions to represent all status of the agent in the scene,which are used to initialize experience pools D1 and D2 and train networks Q1 and Q2. For the actual evacuation drills,the evacuation of the crowd is not completely independent and discrete. Nevertheless,due to the sociality of people,there is a certain social relationship between the people involved in evacuation,and there is often a certain phenomenon of“gathering and following”in crowd evacuation. In addition,to achieve the evacuation process of the crowd better in an actual evacuation drill,the organizer often arrange a certain number of guiders at different locations to assist the participants to complete the process of evacuation. Hence,our framework can add this guide into the virtual scenario and an improved k-medoids algorithm based multi-agent grouping strategy method is implemented. Agent-based location and relationship are involved in and the related grouping of the agents are accomplished as well , i. e. ,the selection of corresponding guiding agents,and the evacuation-led of other agents in the group,and the improved path planning algorithm of double deep Q network architecture mentioned above. A reliability and efficiency of evacuation are improved further. Result Extensive experiment is carried out to validate our proposed methods. In the training process,the network Q3 of the traditional DQN method converge 24 000 batch sizes,while the Q1 and Q2 networks converge about 3 000 batch size as well. In detail,it demonstrates that the convergence performance of proposed method is significantly faster than the traditional DQN method and more stable. Additionally,to improve the evacuation efficiency and evacuation safety of the agent in fire scenarios,average health evacuation value(AHEP)is used to evaluate the evacuation effect. In AHEP criterion,it is about 84% and 104% higher than each traditional path planning methods of A-STAR, DIJKSTRA. Compared to the extended A-STAR and Dijkstra-ACO hybrid algorithm based on changeable fire scene, hybrid algorithm can be improved by 30% and 21%;Compared to DQN algorithm,it can be reached 20% higher. What is more,evacuation efficiency and safety are improved more,and evacuation effect of the planned path is much better. Furthermore,to verify the evacuation effect under different groups,we compared the AHEP values under the four groups of 4, 5,6 and 7. When the group is 6,its value is the highest,which is 17%,13% and 6% higher than those three cases of 4, 5 and 7. Finally,the results show that the appropriate grouping of multi-agent can improve the evacuation efficiency of agent. Conclusion The proposed method has its potentials to improve the evacuation efficiency and security to a certain extent.
Keywords
|