Current Issue Cover
  • 发布时间: 2024-10-29
  • 摘要点击次数:  18
  • 全文下载次数: 6
  • DOI:
  •  | Volume  | Number
(GDC2024)光场角度线索表征的语义分割研究

程欣怡, 贾晨, 张梓轩, 石凡(天津理工大学计算机科学与工程学院)

摘 要
目的 当前的光场语义分割方法局限于单一物体、手工特征表达鲁棒性差且缺乏高层角度语义信息,针对上述不足,文中提出了一种适用于静态图像的端到端语义分割网络,充分挖掘了深度卷积神经网络对光场图像特征的表征潜力,探索了空间和角度结构关系以解决过分割和欠分割问题。方法 从多尺度光场宏像素图像构造出发,基于多种骨干网络设计,提出了一个高效角度特征提取器AFE( Angular Feature Extractor)与空洞空间金字塔池化ASPP( Atrous Spatial Pyramid Pooling)结合的光场语义分割模型。其中,在编码器模块中采用ASPP用于高效地提取并融合宏像素图像中的多尺度空间特征,提高模型对复杂场景的适应能力;在解码器中设计AFE用于提取宏像素图像中的角度结构线索,减少特征在连续下采样过程中存在的角度信息丢失。结果 通过在LF Dataset开源数据集上与最新的7种光场SOTA (State-of-the-art)方法进行实验,利用ResNet101作为骨干网络时所提模型在测试集上实现了88.80%的平均交并比mIoU(mean Intersection over Union),在所有对比方法中是最佳的。结论 文中所提出的模型在提升语义分割性能方面具有可行性和有效性,能够更加精确地捕捉到图像中细微变化的信息,实现更精确的边界分割,为光场技术在场景理解中的应用提供了新的研究方向。
关键词
(GDC2024)Semantic Segmentation of Light Field Angle Cues Representation

chengxinyi, jiachen, zhangzixuan, shifan(School of Computer Science and Engineering, Tianjin University of Technology)

Abstract
Objective Light field images are high-dimensional data capturing multi-view information of scenes, encompassing rich geometric and angular details. In light field semantic segmentation, the goal is to assign semantic labels to each pixel in the light field image, distinguishing different objects or parts of objects. Traditional 2D or 3D image segmentation methods often struggle with challenges such as variations in illumination, shadows, and occlusions when applied to light field images, leading to reduced segmentation accuracy and poor robustness. Leveraging angular and geometric information inherent in light field images, light field semantic segmentation aims to overcome these challenges and improve segmentation performance. However, existing algorithms are typically designed for RGB or RGB-depth image inputs and do not effectively utilize the structural information of light fields for semantic segmentation. Moreover, previous studies mainly focus on handling redundant light field data or manually crafted features, and the highly coupled four-dimensional nature of light field data poses a barrier to conventional CNN (Convolutional Neural Network) modeling approaches. Additionally, prior works have primarily focused on object localization and segmentation in planar spatial positions, lacking detailed angular semantic information for each object. Therefore, we propose a CNN-based light field semantic segmentation network for processing macro-pixel light field images. Our approach incorporates AFE (Angular Feature Extractor) to learn angular variations between different views within the light field image and employs dilated convolution operations to enhance semantic correlations across multiple channels. Method The article proposes an end-to-end semantic segmentation network for static images, starting from the construction of multiscale light field macro-pixel images. It is based on various backbone networks and dilated convolutions. Efficient extraction of spatial features from macro-pixel images poses a challenge, addressed by employing ASPP (Atrous Spatial Pyramid Pooling) in the encoder module to extract high-level fused semantic features. In experiments, the dilation rates for the ASPP module are selected as r = [12, 24, 36], aiming to enrich spatial features under the same-sized feature maps and achieve better semantic segmentation results. The use of different dilation rates in convolutions efficiently extracts multiscale spatial features. In the decoder module, feature modeling is performed to enhance nonlinear expression of low-level semantic features in macro-pixel images and channel correlation representation. Semantic features from the encoder are upsampled four times and concatenated with features generated by the angle model to enhance interactivity between features in the network. Further refinement of these features is achieved through 3×3 convolution operations, combining angle and spatial features for enhanced feature expression. Finally, segmentation results are outputted through fourfold upsampling. To enhance the expression of light field features and fully extract rich angular features from light field macro-pixel images, an AFE is introduced in the decoder stage. AFE operates as a special convolution with kernel size K×K, stride K, and dilation rate 1, where K equals the angular resolution of the light field. Input features for the angle model are derived from the Conv2_x layer of ResNet-101, preserving complete low-dimensional macro-pixel image features. This design is crucial for capturing angular relationships between pixels in sub-aperture images and avoids loss of angular information during consecutive downsampling operations. Incorporation of angular features enables the model to better distinguish boundaries between different categories and provides more accurate segmentation results. In complex scenarios such as uneven illumination, occlusion, or small objects, ASPP can extract a broader context, while AFE can capture complementary angular information between macro-pixel images. Their synergistic effect significantly enhances the performance of semantic segmentation tasks. Result To assess the performance of the proposed model, quantitative and qualitative comparison experiments were conducted on the LFLF Dataset against various optical flow methods mentioned in the paper. To ensure fair comparison, baseline parameters introduced in the article were used as benchmarks. Compared to the SOTA (State-of-the-art) methods, the model achieved 88.80% segmentation accuracy on the test set, outperforming all comparative methods. Furthermore, compared to all other baseline methods, the proposed approach achieved a performance improvement of over 2.15%, enabling more precise capture of subtle changes in images and thus achieving more accurate segmentation boundaries. Compared to five other semantic segmentation methods, this approach demonstrated significant superiority in segmentation boundary accuracy. Meanwhile, a series of relevant ablation experiments were conducted in this study to investigate the advantages of the AFE and multi-scale ASPP. Removing ASPP and AFE resulted in a significant decrease in mIoU (mean Intersection over Union) to 22.51%, a substantial drop of 66.29%, demonstrating that the complete model integrating ASPP and AFE effectively utilizes multi-scale information and angular features to achieve optimal semantic segmentation performance. Specifically, removing multi-scale ASPP led to a performance decrease of 6.58% due to the lack of supplementary multi-scale semantic features achievable only at a single scale. Similarly, removing AFE caused a performance drop of 2.36% due to the absence of guided angular clue features necessary for capturing specific optical flow information. Therefore, it can be conclusively inferred that the synergistic effect of the AFE and multi-scale ASPP significantly enhances the performance of semantic segmentation tasks.Four popular backbone networks, including ResNet101, DRN, MobileNet, and Xception, were experimented with to explore the optimal backbone network for the proposed algorithm. When the backbone network was ResNet101, the highest segmentation metric mIoU obtained was 88.80%. Conclusion This limitation arises from their inability to utilize angular information from light field images, thereby hindering accurate delineation of object boundaries. The proposed approach in this study demonstrates superior performance in overall image segmentation tasks, effectively mitigating issues of over-segmentation and missegmentation.Specifically, by introducing AFE and ASPP, the method proposed in the paper can more accurately capture subtle changes in images, thereby achieving more precise segmentation boundaries. Compared to five other semantic segmentation methods, the approach described in the paper demonstrates significant advantages in the accuracy of segmentation boundaries.The paper introduces a novel light field image semantic segmentation method that takes macro-pixel light field images as input, achieving end-to-end semantic segmentation. To extract angular features of the light field and enhance non-linearity in macro-pixel image features, a simple and efficient angular feature extraction model is designed and integrated into the network. Furthermore, the proposed model is evaluated against SOTA algorithms. Due to its efficient network architecture capable of capturing rich structural cues of light fields, the model achieves the highest mIoU score of 88.80% in semantic segmentation tasks. Experimental results demonstrate the feasibility and effectiveness of the proposed model in enhancing semantic segmentation performance, offering new research directions for the application of light field technology in scene understanding.
Keywords

订阅号|日报