有监督深度学习的优化方法研究综述
江铃燚1,2, 郑艺峰1,2, 陈澈1,2, 李国和3, 张文杰1,2(1.闽南师范大学计算机学院, 漳州 363000;2.数据科学与智能应用福建省高校重点实验室, 漳州 363000;3.中国石油大学信息科学与工程学院, 北京 102249) 摘 要
随着大数据的普及和算力的提升,深度学习已成为一个热门研究领域,但其强大的性能过分依赖网络结构和参数设置。因此,如何在提高模型性能的同时降低模型的复杂度,关键在于模型优化。为了更加精简地描述优化问题,本文以有监督深度学习作为切入点,对其提升拟合能力和泛化能力的优化方法进行归纳分析。给出优化的基本公式并阐述其核心;其次,从拟合能力的角度将优化问题分解为3个优化方向,即收敛性、收敛速度和全局质量问题,并总结分析这3个优化方向中的具体方法与研究成果;从提升模型泛化能力的角度出发,分为数据预处理和模型参数限制两类对正则化方法的研究现状进行梳理;结合上述理论基础,以生成对抗网络(generative adversarialnetwork,GAN)变体模型的发展历程为主线,回顾各种优化方法在该领域的应用,并基于实验结果对优化效果进行比较和分析,进一步给出几种在GAN领域效果较好的优化策略。现阶段,各种优化方法已普遍应用于深度学习模型,能够较好地提升模型的拟合能力,同时通过正则化缓解模型过拟合问题来提高模型的鲁棒性。尽管深度学习的优化领域已得到广泛研究,但仍缺少成熟的系统性理论来指导优化方法的使用,且存在几个优化问题有待进一步研究,包括无法保证全局梯度的Lipschitz限制、在GAN中找寻稳定的全局最优解,以及优化方法的可解释性缺乏严格的理论证明。
关键词
Review of optimization methods for supervised deep learning
Jiang Lingyi1,2, Zheng Yifeng1,2, Chen Che1,2, Li Guohe3, Zhang Wenjie1,2(1.College of Computer Science, Minnan Normal University, Zhangzhou 363000, China;2.Key Laboratory of Data Science and Intelligence Application, Fujian Province University, Zhangzhou 363000, China;3.College of Information Science and Engineering, China University of Petroleum, Beijing 102249, China) Abstract
Deep learning technique has been developing intensively in big data era. However,its capability is still challenged for the design of network structure and parameter setting. Therefore,it is essential to improve the performance of the model and optimize the complexity of the model. Machine learning can be segmented into five categories in terms of learning methods:1)supervised learning,2)unsupervised learning,3)semi-supervised learning,4)deep learning,and 5) reinforcement learning. These machine learning techniques are required to be incorporated in. To improve its fitting and generalization ability,we select supervised deep learning as a niche to summarize and analyze the optimization methods. First,the mechanism of optimization is demonstrated and its key elements are illustrated. Then,the optimization problem is decomposed into three directions in relevant to fitting ability:1)convergence,2)convergence speed,and 3)globalcontext quality. At the same time,we also summarize and analyze the specific methods and research results of these three optimization directions. Among them,convergence refers to running the algorithm and converging to a synthesis like a stationary point. The gradient exploding/vanishing problem is shown that small changes in a multi-layer network may amplify and stimuli or decline and disappear for each layer. The speed of convergence refers to the ability to assist the model to converge at a faster speed. After the convergence task of the model,the optimization algorithm to accelerate the model convergence should be considered to improve the performance of the model. The global-context quality problem is to ensure that the model converges to a lower solution(the global minimum). The first two problems are local-oriented and the last one is global-concerned. The boundary of these three problems is fuzzy,for example,some optimization methods to improve convergence can accelerate the convergence speed of the model as well. After the fitting optimization of the model,it is necessary to consider the large number of parameters in the deep learning model as well,which can cause poor generalization effect due to overfitting. Regularization can be regarded as an effective method for generalization. To improve the generalization ability of the model,current situation of regularization methods are categorized from two aspects:1)data processing and 2)model parameters-constrained. Data processing refers to data processing during model training,such as dataset enhancement,noise injection and adversarial training. These optimization methods can improve the generalization ability of the model effectively. Model parameters constraints are oriented to parameters-constrained in the network,which can also improve the generalization ability of the model. We take generative adversarial network(GAN)as the application background and review the growth of its variant model because it can be as a commonly-used deep learning network. We analyze the application of relevant optimization methods in GAN domain from two aspects of fitting and generalization ability. Taking WGAN with gradient penalty(WGAN-GP)as the basic model,we design an experiment on MNIST-10 dataset to study the applicability of the six algorithms(stochastic gradient method(SGD),momentum SGD,Adagrad,Adadelta, root mean square propagation(RMSProp),and Adam)in the context of deep learning based GAN domain. The optimization effects are compared and analyzed in relevant to the experimental results of multiple optimization methods on variants of GAN model,and some GAN-based optimization strategies are required to be clarified further. At present,various optimization methods have been widely used in deep learning models. Various optimization methods to improve the fitting ability can improve the performance of the model. Furthermore,these regularized optimization methods are beneficial to alleviate the problem of model overfitting and improve the robustness of the model. But,there is still a lack of systematic theories and mechanisms for guidance. In addition,there are still some optimization problems to be further studied. The Lipschitz limitation of global gradients is not guaranteed in deep neural networks due to the gap between theory and practice. In the field of GAN,there is still a lack of theoretical breakthroughs to find the stable global optimal solution,that is,the optimal Nash equilibrium. Moreover,some of the existing optimization methods are empirical and its interpretability is lack of clear theoretical proof. There are many and complex optimization methods in deep learning. The use of various optimization methods should be focused on the integrated effect of multiple optimizations. Our critical analysis is potential to provide a reference for the optimization method selection in the design of deep neural network.
Keywords
machine learning deep learning deep learning optimization regularization generative adversarial network (GAN)
|