Current Issue Cover
结合语义辅助和边缘特征的显著对象检测

代胜选, 许林峰, 刘芳瑜, 贺斌(电子科技大学信息与通信工程学院, 成都 611731)

摘 要
目的 现有的显著对象检测模型能够很好地定位显著对象,但是在获得完整均匀的对象和保留清晰边缘的任务上存在不足。为了得到整体均匀和边缘清晰的显著对象,本文提出了结合语义辅助和边缘特征的显著对象检测模型。方法 模型利用设计的语义辅助特征融合模块优化骨干网的侧向输出特征,每层特征通过语义辅助选择性融合相邻的低层特征,获得足够的结构信息并增强显著区域的特征强度,进而检测出整体均匀的显著对象。通过设计的边缘分支网络以及显著对象特征得到精确的边缘特征,将边缘特征融合到显著对象特征中,加强特征中显著对象边缘区域的可区分性,以便检测出清晰的边缘。同时,本文设计了一个双向多尺度模块来提取网络中的多尺度信息。结果 在4种常用的数据集ECSSD (extended complex scene saliency dataset)、DUT-O (Dalian University of Technology and OMRON Corporation)、HKU-IS和DUTS上与12种较流行的显著模型进行比较,本文模型的最大F值度量(max F-measure,MaxF)和平均绝对误差(mean absolution error,MAE)分别是0.940、0.795、0.929、0.870和0.041、0.057、0.034、0.043。从实验结果看,本文方法得到的显著图更接近真值图,在MaxF和MAE上取得最佳性能的次数多于其他12种方法。结论 本文提出的结合语义辅助和边缘特征的显著对象检测模型十分有效。语义辅助特征融合和边缘特征的引入使检测出的显著对象更为完整均匀,对象的边缘区分性也更强,多尺度特征提取进一步改善了显著对象的检测效果。
关键词
Semantic assistance and edge feature based salient object detection

Dai Shengxuan, Xu Linfeng, Liu Fangyu, He Bin(School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu 611731, China)

Abstract
Objective Human visual system is beneficial to extracting features of the region of interest in images or videos processing. Computer-vision-derived salient object detection aims to improving the ability of visual interpretation for image preprocessing. The quality of the generated saliency map affects the performance of subsequent vision tasks directly. Current deep-learning-based salient object detection can locate salient objects well in terms of the effective semantic features extraction. The issue of clear-edged objects extraction is essential for improving the following visual tasks. In recent years, the complex scenes-oriented edge accuracy of the objects enhancement has been concerned further. Such models are required to obtain fine edges based on the indirect multiple edge losses for the edges of salient objects supervision. To improve the edge details of the object, some models simply fuse the complementary object features and edge features. These models do not make full use of the edge features, resulting in unidentified edge enhancement. Furthermore, it is necessary to use multi-scale information to extract object features because salient objects have variability of positions and scales in visual scenes. In order to regularize clear edges saliency map, we demonstrate a salient object detection model based on semantic assistance and edge feature. Method We use a semantic assistant feature fusion module to optimize the lateral output features of the backbone network. The selective layer features of each fuse the adjacent low-level features with semantic assistance to obtain enough structural information and enhance the feature strength of the salient region, which is helpful to generate a regular saliency map to detect the entire salient objects. We design an edge-branched network to obtain accurate edge features. To enhance the distinguishability of the edge regions for salient objects, the object features are integrated. In addition, a bidirectional multi-scale module extracts the multi-scale information. Thanks to the mechanism of dense connection and feature fusion, the bidirectional multi-scale module gradually fuses the multi-scale features of each adjacent layer, which is beneficial to detect multi-scale objects in the scene. Our experiments are equipped with a single NVIDIA GTX 1080ti graphics-processing unit (GPU) for training and test. We use the DUTS-train datasets to train the model, which contains 10 553 images. The model is trained for convergence with no validation set. The Visual Geometry Group(VGG16) is as the backbone network through the PyTorch deep learning framework. The pre-trained model on ImageNet initializes some parameters of the backbone network, and all newly convolutional layers-added are randomly initialized with "0.01"of variance and "0" of deviation. The hyper-parameters and experimental settings are clarified that the learning rate, weight decay, and momentum are set to 5E-5, 0.000 5, and 0.9, respectively. We use adam optimizer for optimization learning. We carried out back-propagation method based on every ten images. The scale of input image is 256×256 pixels, and random flip is for data enhancement only. The model is trained in 100 iterations totally, and the attenuation is 10 times after 60 iterations. Result Our model is compared to twelve existing popular saliency models based on four commonly-used datasets, i.e., extended complex scene saliency dataset (ECSSD), Dalian University of Technology and OMRON Corporation (DUT-O), HKU-IS, and DUTS. The analyzed results show that the maximum F-measure values of our model on each of four datasets are 0.940, 0.795, 0.929, and 0.870, the mean absolution error(MAE) values are 0.041, 0.057, 0.034, and 0.043, respectively. Our saliency maps obtained are closer to the ground truth. Conclusion We develop a model to detect salient objects. The semantic assisted feature and edge feature fusion in the model is beneficial to generate regularized saliency maps in the context of clear object edges. The multi-scale feature extraction improves the performance of salient object detection further.
Keywords

订阅号|日报