模仿学习综述:传统与新进展
摘 要
模仿学习是强化学习与监督学习的结合,目标是通过观察专家演示,学习专家策略,从而加速强化学习。通过引入任务相关的额外信息,模仿学习相较于强化学习,可以更快地实现策略优化,为缓解低样本效率问题提供了解决方案。模仿学习已成为解决强化学习问题的一种流行框架,涌现出多种提高学习性能的算法和技术。通过与图形图像学的最新研究成果相结合,模仿学习已经在游戏人工智能(artificial intelligence,AI)、机器人控制和自动驾驶等领域发挥了重要作用。本文围绕模仿学习的年度发展,从行为克隆、逆强化学习、对抗式模仿学习、基于观察量的模仿学习和跨领域模仿学习等多个角度进行深入探讨,介绍了模仿学习在实际应用上的最新情况,比较了国内外研究现状,并展望了该领域未来的发展方向。旨在为研究人员和从业人员提供模仿学习的最新进展,从而为开展工作提供参考与便利。
关键词
Survey of imitation learning: tradition and new advances
Zhang Chao1, Bai Wensong1, Du Xin2, Liu Weijie1, Zhou Chenhao1, Qian Hui1(1.College of Computer Science and Technology, Zhejiang University, Hangzhou 310027, China;2.College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China) Abstract
Imitation learning(IL) is focused on the integration of reinforcement learning and supervised learning through observing demonstrations and learning expert strategies. The additional information related imitation learning can be used to optimize and implement its strategy, which can provide the possibility to alleviate low efficiency of sample problem. In recent years, imitation learning has become a popular framework for solving reinforcement learning problems, and a variety of algorithms and techniques have emerged to improve the performance of learning procedure. Combined with the latest research in the field of image processing, imitation learning has played an important role in such domains like game artificial intelligence(AI), robot control, autonomous driving. Traditional imitation learning methods are mainly composed of behavioral cloning(BC), inverse reinforcement learning(IRL), and adversarial imitation learning(AIL). Thanks to the computing ability and upstream graphics and image tasks(such as object recognition and scene understanding), imitation learning methods can be used to integrate a variety of technologies-emerged for complex tasks. We summarize and analyze imitation learning further, which is composed of imitation learning from observation(ILfO) and cross-domain imitation learning(CDIL). The ILfO can be used to optimize the requirements for expert demonstration, and information-observable can be learnt only without specific action information from experts. This setting makes imitation learning algorithms more practical, and it can be applied to real-life scenes. To alter the environment transition dynamics modeling, ILfO algorithms can be divided into two categories:model-based and model-free. For model-based methods, due to path-constructed of the model in the process of interaction between the agent and the environment, it can be assorted into forward dynamic model and inverse dynamic model further. The other related model-free methods are mainly composed of adversarial-based and function-rewarded engineering. Cross-domain imitation learning are mainly focused on the status of different domains for agents and experts, such as multiple Markov decision processes. Current CDIL research are mainly focused on the domain differences of three aspects of discrepancy in relevant to:transition dynamics, morphological, and view point. The technical solutions to CDIL problems can be mainly divided into such methods like:direct, mapping, adversarial, and optimal transport. The application of imitation learning is mainly on such aspects like game AI, robot control, and automatic driving. The recognition and perception capabilities of intelligent agents are optimized further in image processing, such as object detection, video understanding, video classification, and video recognition. Our critical analysis can be focused on the annual development of imitation learning from the five aspects:behavioral cloning, inverse reinforcement learning, adversarial imitation learning, imitation learning from observation, and cross-domain imitation learning.
Keywords
imitation learning(IL) reinforcement learning imitation learning form observation(ILfO) cross domain imitation learning(CDIL) application of imitation learning
|