面向元余弦损失的少样本图像分类
陶鹏1, 冯林1, 杜彦东1, 龚勋2, 王俊3(1.四川师范大学计算机科学学院, 成都 610101;2.西南交通大学计算机与人工智能学院, 成都 610031;3.四川师范大学商学院, 成都 610101) 摘 要
目的 度量学习是少样本学习中一种简单且有效的方法,学习一个丰富、具有判别性和泛化性强的嵌入空间是度量学习方法实现优秀分类效果的关键。本文从样本自身的特征以及特征在嵌入空间中的分布出发,结合全局与局部数据增强实现了一种元余弦损失的少样本图像分类方法(a meta-cosine loss for few-shot image classification,AMCL-FSIC)。方法 首先,从数据自身特征出发,将全局与局部的数据增广方法结合起来,利于局部信息提供更具区别性和迁移性的信息,使训练模型更多关注图像的前景信息。同时,利用注意力机制结合全局与局部特征,以得到更丰富更具判别性的特征。其次,从样本特征在嵌入空间中的分布出发,提出一种元余弦损失(meta-cosine loss,MCL)函数,优化少样本图像分类模型。使用样本与类原型间相似性的差调整不同类的原型,扩大类间距,使模型测试新任务时类间距更加明显,提升模型的泛化能力。结果 分别在5个少样本经典数据集上进行了实验对比,在FC100(Few-shot Cifar100)和CUB(Caltech-UCSD Birds-200-2011)数据集上,本文方法均达到了目前最优分类效果;在MiniImageNet、TieredImageNet和Cifar100数据集上与对比模型的结果相当。同时,在MiniImageNet,CUB和Cifar100数据集上进行对比实验以验证MCL的有效性,结果证明提出的MCL提升了余弦分类器的分类效果。结论 本文方法能充分提取少样本图像分类任务中的图像特征,有效提升度量学习在少样本图像分类中的准确率。
关键词
Meta-cosine loss for few-shot image classification
Tao Peng1, Feng Lin1, Du Yandong1, Gong Xun2, Wang Jun3(1.School of Computer Science, Sichuan Normal University, Chengdu 610101, China;2.School of Computing and Artificial Intelligence, Southwest Jiaotong University, Chengdu 610031, China;3.School of Business, Sichuan Normal University, Chengdu 610101, China) Abstract
Objective Few-shot learning(FSL) is a popular and difficult problem in computer vision.It aims to achieve effective classification with a few labeled samples.Recent few-shot learning methods can be divided into three major categories:metric-,transfer-,and gradient-based methods.Among them,metric-based learning methods have received considerable attention because of their simplicity and excellent performance in few-shot learning problems.In particular,metricbased learning methods consist of a feature extractor based on a convolutional neural network(CNN) and a classifier based on spatial distance.By mapping the samples into the embedding space,a simple metric function is used to calculate the similarity between the sample and the class prototype,quickly identifying the novel class sample.The metric function is used for classification,and it bypasses the optimization problem in the few-shot setting when using network learning classifiers.Therefore,a richer,more discriminative,and better generalization embedding space is the key for metric-based learning methods.From the perspective of the feature and its embedding space,and by combining the global and local features of a sample,we propose a meta-cosine loss for few-shot image classification method,called AMCL-FSIC,to improve the accuracy of metric-based learning methods.Method On the one hand,our primary objective is to obtain suitable features.Image information is composed of foreground and background images.The foreground image is beneficial for few-shot classification,whereas the background image is detrimental.If we can force the model to focus only on the foreground during training and evaluation and disregard the background,then this scenario is helpful for image classification.However,it is not easy to achieve.In fact,we need prior knowledge of the prospective object.As stated by previous researchers,images are roughly divided into global and local features,which are randomly cropped portions of each image.Local features contain cross-category discriminatory and transferable information,which is of considerable significance for few-shot image classification.First,we combine global and local data enhancement strategies.In particular,the local information of an image allows the model to give more attention to the uniqueness and transfer characteristics of the sample,minimizing the effect of background information.Then,the introduction of the self-attention mechanisms helps combine global and local features,gaining richer and more distinguished features.On the other hand,from the feature distribution in the embedded space,we meta-train a cosine classifier and minimize loss by calculating the strings between the sample and the prototypes.In the embedded space,features with the same category are gathered together,while different categories of features are far from one another.However,previous residue classifiers only give attention to the same class during the training period and do not completely stretch different types of samples.The direct consequence of this situation is that the generalization capacity of the model decreases when facing new test tasks with similar categories.We propose the meta-cosine loss(MCL) on the basis of the cosine classifier.During meta-training,the difference of the cosine similarity between the sample and the class prototype is used to adjust the class prototype in accordance with the parallelogram principle.MCL places the model as far away as possible from the feature clusters of different classes in the task,ensuring that the classes are more separable when the model faces a new test task and improving the generalization ability of the model.Result We conduct extensive experiments to verify the model's effectiveness.Experiments are performed on five classical few-shot datasets,as follows:MiniImageNet,TieredImageNet,Cifar100,Few-shot Cifar 100(FC100),and Caltech-UCSD Birds-200-2011(CUB).The input images are resized to 84 × 84 pixels for training,the momentum parameter is set to 0.95,the learning rate is set to 0.000 2,and the weight decay is 0.000 1.The model learning procedure is accelerated using a NVIDIA GeForce RTX 3090 GPU device.To ensure the fairness of comparison,we adopt the 5-way 1-shot and 5-way 5-shot settings during the training and testing phases.The experimental results show that the image classification accuracy of MiniImageNet,TieredImageNet,Cifar100,FC100,and CUB datasets is 68.92/84.45,72.41/87.36,76.79/88.52,50.86/67.19,and 81.12/91.43,respectively,on the 5-way 1-shot and 5-way 5-shot settings.Compared with the latest few-shot image classification methods,our model exhibits more advantages.Simultaneously,we perform comparative experiments on the MiniImageNet,CUB,and Cifar100 datasets to verify the effectiveness of MCL.From the comparative experimental results,the introduction of the MCL classifier can improve image classification accuracy by nearly 4% and 2% under the 1-shot and 5-shot settings,respectively.MCL has considerably improved the classification ability of the cosine classifier.Conclusion Our work proposes MCL and combines global and local data augmentation methods to improve the generalization ability of the model.This approach is suitable for any metric-based method.
Keywords
|