BSGAN-GP:类别均衡驱动的半监督图像识别模型

胡静; 张汝敏; 连炳全

发布时间： 2024-05-20
摘要点击次数： 273
全文下载次数： 53
DOI: :10.11834/jig.230881
| Volume | Number

BSGAN-GP:类别均衡驱动的半监督图像识别模型

胡静, 张汝敏, 连炳全(太原科技大学)

摘要

目的已有的深度学习图像识别模型严重依赖于大量专业人员手工标记的数据，这些专业图像标签信息难以获取、人工标记代价昂贵。同时实际场景中的数据集大多具有不平衡性，正负样本偏差严重导致模型在拟合时常偏向多数类，对少数类的识别精度不足。这严重阻碍了深度学习在实际图像识别中的广泛应用方法本文结合半监督生成式对抗网络（Semi-supervised GAN）提出了一种新的平衡模型架构（BSGAN-GP），使得半监督生成式对抗网络的鉴别器可以公平判别每一个类。其中，提出的类别均衡随机选择算法（CBRS）可以解决图像样本类别不均导致少数类识别准确度低的问题。将真实数据中有标签数据按类别随机选择，使得输入的有标签数据每个类别都有相同的数量，然后将训练后参数固定的生成器NetG生成每个类同等数量的假样本输入鉴别器，更新鉴别器NetD保证了鉴别器可以公平判别所有类；同时BSGAN-GP在鉴别器损失函数中添加了一个额外的梯度惩罚项，使得模型训练更稳定。结果实验在三个主流数据集上与9种图像识别方法（包含6种半监督方法和3种全监督方法）进行了比较。为了证明对少数类的识别准确度提升，制定了三个数据集的不平衡版本。在Fashion-MNIST 数据集中，相比于基线模型，总体准确值提高了3.281%，少数类识别率提升了7.14%；在MNIST数据集中，相比于基线模型，对应的4个少数类识别率提升了2.68%~7.40%；在SVHN数据集中，相比于基线模型，总体准确值提高了3.515%。同时也在三个数据集中进行了合成图像质量对比以验证CBRS算法的有效性，其少数类合成图像质量以及数量的提升证明了其效果。消融实验评估了所提出模块CBRS与引进模块在网络中的重要性结论本文所提出的BSGAN-GP模型，能够实现更公平的图像识别以及更高质量的合成图像结果。

关键词

深度学习半监督学习(SSL) 生成式对抗网络不平衡性图像识别梯度惩罚

BSGAN-GP: a Semi-Supervised image recognition model driven by class-balanced

Hujing, Zhangrumin, Lianbingquan(Taiyuan University of Science and Technology)

Abstract

Objective Image classification technology has realized the high precision automatic classification and screening of digital images, with the improvement of algorithm performance and the development of computer hardware. It uses a computer to conduct a quantitative analysis of the image, classifying each area in the image or image into one of several categories to replace human visual interpretation. However, in practice, a large number of training samples and high-quality annotation information must be required for high-quality training in order to obtain high accuracy classification results. For large-scale image data sets, the existing image annotation methods need manual annotation by industry experts, such as polygon annotation and key point annotation. Due to the high cost of expert annotation and the difficulty of high-quality annotation, less labeled image data, which seriously hinders the development of deep learning in computer vision. To this end, the Semi-supervised GAN paradigm (SSL) is proposed, because it can use a large amount of unlabeled data to obtain the distribution characteristics of real samples in the feature space, and more accurately determine the classification boundaries. The generative Semi-supervised GAN model has the characteristic of creating new samples and increasing sample diversity, which is more widely used in various fields, such as DCGAN and Semi-supervised GAN. However, this model is often unstable in adversarial training, especially on an unbalanced dataset, the gradient can easily fall into the trap of predicting most of the data. Since image datasets in real-world industrial applications are often category-unbalanced, this imbalance negatively affects the accuracy of mining classifiers. Several recent studies have revealed the effectiveness of GAN in alleviating the problem of imbalance, such as DAGAN, BSSGAN, BAGAN, and improve-BAGAN. Among them, BAGAN acts as an enhancement method to recover the balance in unbalanced datasets, which can learn useful features from most classes and use these features to generate images for minority classes. However, the experimental results show that its encoder lost a lot of details in the image reconstruction process, making the appearance of similar categories not easy to distinguish in the reconstructed figures. Improve-BAGAN improves the BAGAN, and increasing the gradient penalty makes the model training more stable.improve-BAGAN is the state-of-the-art achievement of existing supervised learning to solve unbalanced problems, but to achieve the expected results of the model requires manual labeling of enough samples, which greatly increases the labor and time costs. Method In this study, a new balanced image recognition model based on semi-supervised generative adversarial network (Semi-supervised GAN) is established, enabling the discriminator of semi-supervised generative adversarial network to fairly identify every class of unbalanced data set. The proposed balanced image recognition model BSGAN-GP consists of two components: the category equilibrium random selection(CBRS) algorithm and the discriminator for adding gradient penalty. For the bran-new CBRS algorithm, we randomly selected the label data in the real data by category so that the number of labels in each class in the input model is consistent, ensuring the balance between the real sample and the generator synthesis sample. Then conduct confrontation training and the generator NetG with fixed parameters generates the same number of false sample input discriminator for each class and update the discriminator NetD to ensure that the discriminator can fairly judge all classes to improve the identification accuracy of the minority classes. Meanwhile, BSGAN-GP adds an additional gradient penalty item in the discriminator loss function to make the model training more stable. The optimizer selected for the experiment was the Adam algorithm, with the learning rate set to 0.0002 and the momentum set to (0.5,0.9). The batch size for all three datasets was 100, where the MNIST and Fashion datasets were set to 1000, or 100 per class and 5000 for SVHN, or 500 per class. The experiment used a RTX 4090 GPU, 24GB of memory, and most studies in the experiment were done within 4500 seconds. For MNIST and Fashion-MNIST, we trained 25 epochs, each epoch taking 85 and 108 seconds respectively on our device. For the SVHN, we trained 30 epochs, with each epoch requiring 110 seconds on our device. Result The experiment is compared with six semi-supervised methods and three fully supervised parties in the three mainstream datasets. In order to prove the improved identification accuracy of a few classes, an unbalanced version of the three datasets is developed. The experimental indicators include overall accuracy, category recognition rate, confusion matrix and synthesized images. In the unbalanced Fashion-MNIST, compared to the Semi-supervised GAN, the overall accuracy value increased by 3.281%, and the minority class recognition rate increased by 7.14%; In the unbalanced MNIST, the recognition rate of the corresponding four minority classes increased by 2.68% to 7.40% compared with the Semi-supervised GAN; In the SVHN, the overall accuracy value increased by 3.515% compared with the Semi-supervised GAN. At the same time, the quality comparison of synthetic images was also conducted in three data sets to verify the effectiveness of CBRS algorithm, and the improvement of synthetic images on the quantity and quality of a few classes proved its effect. Ablation experiments evaluate the importance of the proposed module CBRS versus the introduced module in the network. The CBRS module improved the overall accuracy of the model by 2% to 3%, and the GP module improved the overall accuracy of the model by 0.8% to 1.8%. Conclusion In this study, we propose a new algorithm called Class Balanced Random Selection (CBRS) to achieve fair recognition of all classes in unbalanced datasets. While introducing gradient penalty into the discriminator of semi supervised generative adversarial networks for more stable training. The experiment results indicate that CBRS can achieve fairer image recognition and higher quality synthesized image results.

Keywords

deep learning, semi-supervised learning(SSL), generative adversarial network, unbalanced image recognition, gradient punishment

在线采编平台

论文出版

年度会议

下载中心

年度信息