Current Issue Cover
结合改进卷积神经网络与通道加权的轻量级表情识别

梁华刚, 薄颖, 雷毅雄, 喻子鑫, 刘丽华(长安大学电子与控制工程学院,西安 710064)

摘 要
目的 表情是人机交互过程中重要的信息传递方式,因此表情识别具有重要的研究意义。针对目前表情识别方法存在背景干扰大、网络模型参数复杂、泛化性差等问题,本文提出了一种结合改进卷积神经网络(convolutional neural network,CNN)与通道加权的轻量级表情识别方法。方法 首先,采用标准卷积和深度可分离卷积组合神经网络结构,再利用全局平均池化层作为输出层,简化网络的复杂程度,有效降低网络参数;其次,网络引入SE(squeeze-and-excitation)模块进行通道加权,通过在不同卷积层后设置不同的压缩率增强表情特征提取能力,提升网络模型精度;最后,用softmax分类函数实现各类表情的准确分类。结果 本文网络参数量为6 108 519,相较于识别性能较好的Xception神经网络参数减少了63%,并且通过对网络模型的实时性测试,平均识别速度可达128 帧/s。在5个公开的表情数据集上验证网络模型对7种表情的识别效果,与7种卷积神经网络方法相比,在FER2013 (Facial Expression Recognition 2013)、CK+ (the extended Cohn-Kanade) 和JAFFE (Japanses Female Facial Expression) 3个表情数据集的识别精确度提高了5.72%、0.51%和0.28%,在RAF-DB (Real-world Affective Faces Database)、AffectNet这两个in-the-wild表情数据库的识别精确度分别提高了2.04%和0.68%。结论 本文提出的轻量级表情识别方法在不同通道具有不同的加权能力,获取更多表情关键特征信息,提高了模型的泛化性。实验结果表明,本文方法在简化网络的复杂程度、减少计算量的同时能够准确识别人脸表情,能够有效提升网络的识别能力。
关键词
A CNN-improved and channel-weighted lightweight human facial expression recognition method

Liang Huagang, Bo Ying, Lei Yixiong, Yu Zixin, Liu Lihua(School of Electronics and Control Engineering,Chang'an University,Xi'an 710064,China)

Abstract
Objective Human facial expression can be as a human emotion style and information transmission carrier in the process of human-robot interaction. Thanks to the artificial intelligence (AI) development, facial expression recognition (FER) has been developing in the context of emotion understanding, human-robot interaction, safe driving, medical treatment, and communications. However, current facial expression recognition studies have been challenging of some problems like large background interference, complex network model parameters, and poor generalization. We develop a lightweight facial expression recognition method based on improved convolutional neural network (CNN-improved) and channel-weighted in order to improve its recognition and classification and the key feature information mining of facial expressions. Method Human facial expression recognition network is focused on facial-related image gathering, image preprocessing, feature extraction, and expression-related classification and recognition, amongst feature extraction is as the key step of the network structure. Our demonstration is illustrated as following: 1) different collections of expression-related datasets are obtained for indoor and outdoor scenarios. 2) Data-enhanced method is used to pre-process the expression-related image through avoiding the distorted background information and resolving the problems of over-fitting and poor robustness related to deep learning algorithms. 3) The lightweight expression network is designed and trained in terms of the enhanced depth-segmented convolutional channel feature. To reduce the network parameters effectively, deep-segmented convolution and global average pooling layer are deployed. The squeeze-and-excitation(SE) module is also embedded to optimize the model. Multi-channels-related compression rates are set to extract facial expression features more efficiently and thus the recognition ability of the network is improved. Our main contributions are clarified as mentioned below: 1) data preprocessing module: it is mainly based on data enhancement operations, such as image size normalization, random rotation and cropping, and random noise-added. The interference information is removed and the generalization of the model is improved. 2) Network model: a convolutional neural network (CNN) is adopted and an enhanced depth-segmented convolution channel feature module (also called basic block) for channel weighting is designed. The space and channel information in the local receptive field are extended by setting different compression rates originated from different convolution layers. 3) Verification: facial expression recognition method is performed on a number of popular public datasets and achieved high recognition accuracy. Result The best compression ratio combinations of SE modules are sorted out through experiments and embedded into the constructed lightweight network, and experimental evaluation is carried out on five commonly-used expression datasets. It shows that our recognition accuracy of the three indoor-related expression datasets of FER2013(Facial Expression Recognition 2013), CK+(the extended Cohn-Kanade) and JAFFE(Japanses Female Facial Expression) are 79.73%, 99.32%, and 98.48%, which are improved 5.72%, 0.51% and 0.28%. The two outdoor expression datasets of RAF-DB(Real-world Affective Faces Database) and AffectNet are obtained recognition accuracy of 86.14% and 61.78%, which are improved 2.01% and 0.67%. In contrast to the Xception neural network, a lightweight network is facilitated while the parameters are reduced by 63%. The average recognition speed can reach 128frame/s, which meets the real-time requirements. Conclusion Our lightweight expression recognition method has different weighting capabilities in different channels. The key expression information can be obtained. The generalization of this model is enhanced. To improve the recognition ability of the network effectively, our method can recognize facial expressions accurately based on network simplification and calculation cost optimization.
Keywords

订阅号|日报