面向甲状腺结节超声图像分割的多尺度特征融合“h”形网络

于典; 彭延军; 郭燕飞

发布时间： 2023-07-19
摘要点击次数： 1520
全文下载次数： 1243
DOI: 10.11834/jig.220078
2023 | Volume 28 | Number 7

面向甲状腺结节超声图像分割的多尺度特征融合“h”形网络

于典, 彭延军, 郭燕飞(山东科技大学计算机科学与工程学院, 青岛 266590)

摘要

目的准确定位超声甲状腺结节对甲状腺癌早期诊断具有重要意义，但患者结节大小、形状以及位置的不确定性极大影响了结节分割的准确率和模型的泛化能力。为了提高超声甲状腺结节分割的精度，增强泛化性能并降低模型的参数量，辅助医生诊断疾病，减少误诊，提出一种面向甲状腺结节超声图像分割的多尺度特征融合“h”形网络。方法首先提出一种网络框架，形状与字母h相似，由一个编码器和两个解码器组成，引入深度可分离卷积缩小网络尺寸。编码器用于提取图像特征，且构建增强下采样模块来减少下采样时造成的信息损失，增强解码器特征提取的能力。第1个解码器负责获取图像的初步分割信息；第2个解码器通过融合第1个解码器预先学习到的信息来增强结节的特征表达，提升分割精度，并设计了融合卷积池化金字塔实现多尺度特征融合，增强模型的泛化能力。结果该网络在内部数据集上的Dice相似系数（Dice similarity coefficients，DSC）、豪斯多夫距离（Hausdorff distance，HD）、灵敏度（sensitivity，SEN）和特异度（specificity，SPE）分别为0.872 1、0.935 6、0.879 7和0.997 3，在公开数据集DDTI （digital database thyroid image）上，DSC和SPE分别为0.758 0和0.977 3，在数据集TN3K （thyroid nodule3 thousand）上的重要指标DSC和HD分别为0.781 5和4.472 6，皆优于其他模型。结论该网络模型以较低的参数量提升了甲状腺超声图像结节的分割效果，增强了泛化性能。

关键词

深度学习甲状腺结节超声分割 h网络增强下采样多尺度

Ultrasonic image segmentation of thyroid nodules-relevant multi-scale feature based h-shape network

Yu Dian, Peng Yanjun, Guo Yanfei(School of Computer Science and Technology, Shandong University of Science and Technology, Qingdao 266590, China)

Abstract

Objective Early diagnosis of thyroid cancer-beneficial lesions of ultra-sound thyroid nodules are required to be located accurately. Ultra-sound imaging technique is potential for the diagnosis of thyroid diseases and it is cost-effective and simplified to a certain extent. Thyroid imaging reporting and data system（TI-RADS）is focused on benign and malignant nodules-relevant evaluation recently. The probability of the nodule will be much more distorted when the level is higher. The first step of ultrasound evaluation is oriented to segment thyroid nodule. At present，the commonly-used segmentation method is focused on manual segmentation，which is still labor intensive and experience behavioral. Computer technology-based medical imaging technique is focused on realizing automatic segmentation of ultrasonic nodules and the speed and accuracy of diagnosis can be improved. Current deep learning technique has its potentials for several of visual recognition tasks in recent years. Compared to traditional contour-shape and region based methods，deep learning technology is preferred to improve the accuracy of tasks. Fully convolutional neural network（FCN）and convolutional neural network（CNN）based multiple models are oriented to achieve specific segmentation tasks. However，the speckle noise of ultrasound images and the uncertainty of the size，shape and location of the patient’ s nodules have affected the accuracy of nodule greatly. Method First，the h-shape network framework is proposed in terms of an encoder and two decoders. The shape of the framework is similar to the letter“h”，and the depth separable convolution is introduced to shrink the network size. The second convolution of each layer in the network is replaced by the depth separable convolution to lower the number of parameters of the model. The encoder is used to extract image features，and the enhanced down-sampling module is constructed to alleviate down-sampling-led information loss. The module is composed of a connection of maximum pool and average pool，batch normalization and average pool，which is used to enhance the feature extraction capability of the decoder. The first decoder is in supporting of the preliminary segmentation information of the image，and the second decoder can be used to enhance the feature expression of the nodules，and the segmentation accuracy can be improved via the first decoder-related fusion of the learned information. Finally，the fusion convolutional pyramid pooling module is designed as well，in which atrous spatial pyramid pooling module and deep separable convolution can be integrated together to realize multi-scale feature fusion while the network size and the generalization ability of the model are optimized. The four sorts of decoder blocks of the second decoder can be operated through its fusion convolutional pyramid pooling module for each of them，and final prediction result is generated after concat operation. Three kinds of datasets are provided to verify the model well，which consists of 3 622 ultrasound images-within internal dataset，637 ultrasound images-involved digital database thyroid image dataset，and 3 493 ultrasound images-included TN3K public dataset. The internal dataset and TN3K dataset are divided into training set，validation set，and test set in a ratio of 8∶1∶1 to train the model. Due to the small number of thyroid nodule samples are linked to the DDTI dataset，partitioning of the DDTI dataset is prone to be over fitted，and the weight information of the internal dataset can be used to test the DDTI dataset straightforward. The experiment is built on the Pytorch framework and Nvidia RTX 2080 TI is used to train the model. Using Adam as the optimizer， the initial learning rate is 0. 000 1. There are 200 rounds of training，and the learning rate is lower to half every 20 rounds. The batch size is set to 8. To perform well in the segmentation of small nodules on the basis of stable overall segmentation， DiceBCELoss is regarded as the loss function，which can combine BCE loss function with Dice loss function. The segmentation results are analyzed in quantitative in terms of the Dice similarity coefficient（DSC），Hausdorff distance（HD），sensitivity（SEN），and specificity（SPE）. Result To validate the ability of the proposed method，comparative analysis is carried out，which is in comparison with AttentionUNet，marker-guided U-Net（MG-UNet），fully convolutional dense dilated Net（FCdDN），DeepLab V3+，segmentation network（segNet）and context encoder network（CE_NET）. For the internal dataset， the DSC， HD， SEN and SPE indexes of the proposed model are reached to 0. 872 1， 0. 935 6， 0. 879 7 and 0. 997 3 each. The DSC is 15. 53% improved than the worst model，and 1. 2% improved than the second best one；The HD is 2. 583 6 decreased than the worst model，and 0. 034 1 decreased than the second best model；The SEN and SPE are increased by 0. 32% and 1. 17% than the second best model，which are 7. 57% and 9. 96% higher than the worst model. For the digital database thyroid image dataset，the DSC and SPE are 0. 758 0 and 0. 977 3 each，which are 9. 83% and 15. 25% improved than the worst model，1. 02% and 0. 71% increased than the best model. The DSC of 0. 781 5 and the HD of 4. 472 6 are obtained on TN3K dataset，which are 1. 27% higher and 0. 634 5 lower than the model with the second best performance. Furthermore，a series of ablation experiments are conducted on the proposed model as well to demonstrate the effectiveness of the different steps of the fusion algorithm. Conclusion This proposed network can improve the segmentation accuracy of thyroid nodules，and its computing cost and generalization ability of the model can be optimized further.

Keywords

deep learning thyroid nodule ultrasound segmentation h-network enhanced down-sampling multi-scale

在线采编平台

论文出版

年度会议

下载中心

年度信息