结合反事实提示与级联双解码SAM的超声图像分割
霍一儒1, 封筠1, 刘娜1, 史屹琛2, 殷梦莹3(1.石家庄铁道大学;2.上海交通大学;3.河北医科大学第一医院) 摘 要
目的 分割一切模型(Segment Anything Model,SAM)在自然图像分割领域已取得显著成就,但其应用于医学成像尤其是涉及对比度低、边界模糊、形状复杂的超声图像时,分割过程往往需要人工干预,并且会出现分割性能下降的情况。针对上述问题,提出一种结合反事实提示与级联解码SAM的改进方法(SAM combined with Counterfactual prompt and cascaded Decoder,SAMCD)。方法 SAMCD在SAM的基础上增加旁路CNN图像编码器、跨分支交互适配器、提示生成器和级联解码器。首先,通过使用旁路CNN编码器以及所设计的跨分支交互适配器,补充ViT编码器缺乏的局部信息,以提高模型对细节的捕捉能力;然后,引入因果学习的反事实干预机制,通过生成反事实提示,迫使模型专注于事实提示生成,提高模型分割精度;其次,采用所提出的级联解码器获得丰富的边缘信息,即先利用SAM的原始解码器创建先验掩码,再使用加入边界注意力的Transformer解码器和像素解码器;最后,在训练模型时采用两阶段的训练策略,即交互分割模型训练阶段和自动分割模型训练阶段。结果 在TN3K和BUSI这2个数据集上进行实验,SAMCD的DSC值分别达到83.66%和84.29%,较SAMCT提升0.73、0.90个百分点,且较对比的SAM及其变体模型更为轻量化;相较于9种先进方法,SAMCD在DSC、mIoU、HD、敏感性和特异性指标上均达到最优。消融实验和可视化分析表明提出的SAMCD方法具有明显的提升效果。结论 本文提出的超声医学图像分割SAMCD方法在充分利用SAM强大的特征表达能力的基础上,通过对编码器、提示生成器、解码器和训练策略的改进,能够精准地捕获超声图像中的复杂局部细节和小目标,提高超声医学图像自动分割效果。
关键词
Ultrasound Image Segmentation Using SAM Combined with Counterfactual Prompt and Cascaded Decoder
huoyiru, fengjun1, liuna1, shiyichen2, yinmengying3(1.Shijiazhuang Tiedao University;2.Shanghai Jiao Tong University;3.The First Hospital of Hebei Medical University) Abstract
Objective Ultrasound imaging is a fundamental tool in medical diagnosis due to its convenience, non-radiative, and cost-effectiveness, making it an indispensable component of clinical diagnostics. However, accurately localizing and extracting detailed features from ultrasound images, particularly in cases involving complex pathological boundaries such as nodules and cysts, remains a significant challenge. Traditional Convolutional Neural Networks (CNNs) are proficient in feature extraction through convolutional layers, but their limited receptive fields often result in a loss of global information. Conversely, Transformer-based models are adept at capturing global features through self-attention mechanisms, yet they frequently fail to capture local details effectively. Additionally, their high computational requirements limit their practical use in real-time medical applications. The recent Segmentation Any Model (SAM) has shown notable success in natural image segmentation. However, its performance declines when applied to medical imaging, particularly in ultrasound image segmentation, often necessitating manual intervention. This decline is primarily due to SAM is trained exclusively on natural images, which exhibit a domain distribution vastly different from medical images. To address this limitation, we propose an enhanced SAM model, i.e. SAM combined with Counterfactual Prompt and Cascaded Decoder (SAMCD). Method SAMCD enhances the existing SAM framework by incorporating a Bypass CNN image encoder, a Simple Cross-Branch Interaction Adapter (SCIA), a counterfactual intervention prompt generator, and a cascaded decoder. Initially, we utilize Bypass CNN encoder and a novel module named SCIA. Integrating the Bypass CNN encoder with the SCIA module compensates for the ViT encoder's lack of local information, thereby enhancing the model’s ability to capture fine details. Next, to adapt to the prompts produced by the prompt generator and to optimize its output, we introduce a counterfactual intervention mechanism based on causal learning. This mechanism forces the model to focus on factual prompt generation, enhancing the learning capability of the prompt generator, improving the model's segmentation precision, and reducing dependency on high-quality prompts. Additionally, to capture richer edge information, we designed a cascaded decoder. SAM's original decoder is used to create a prior mask, followed by an edge-attention enhanced Transformer decoder and pixel decoder to further understand rich edge information and optimize the segmentation results. Finally, we employ a two-stage training strategy to enhance the model's segmentation performance and accelerate convergence. The first stage focuses on training the interactive segmentation model, while the second stage concentrates on training the automatic segmentation model that incorporates a prompt generator. In the experiments, the hardware platform is NVIDIA GeForce RTX 3090, the programming language is Python 3.9, and the deep learning framework is PyTorch. the network is trained with a Batch size of 4, a learning rate of 0.0001, and a number of training rounds of 200, and the Adam optimizer is chosen. Before training, SAMCD is initialized with SAM weights, and during training, the images are scaled to 256x256 resolution using bilinear interpolation. Result Experiments were conducted on the TN3K and BUSI datasets to evaluate the SAMCD model using a range of metrics including DSC, mIoU, HD, ACC, Sen and Spe. Notably, lower HD values indicate better segmentation performance, while higher values for the other metrics like DSC and mIoU, ranging from 0 to 1, indicate better performance. In these evaluations, SAMCD has a DSC score of 83.66% on the TN3K dataset and 84.29% on the BUSI dataset, which are higher than the original SAM, MedSAM, SAMed, and SAMCT. Compared to SAMCT, on the TN3K dataset, SAMCD improves 0.91% and 0.16% on mIoU and ACC, and improves 20.43% and 12.91% on average compared to the rest of the SAM-related comparison models. In comparison with non-SAM approaches, the DSC values on the TN3K dataset are 4.65%, 3.29%, 13.58%, 5.16%, and 2.22% higher than those of U-Net, CE-Net, SwinUnet, TransFuse, and TransFuse, respectively; and on ACC they are higher than those of TransFuse and TransFuse by 0.79% and 0.29%; and on average 4.95% and 0.46% higher on Sen and Spe than the five non-SAM methods. In addition, SAMCD requires fewer training parameters and consumes less computational resources compared to SAM-related models. Ablation experiments and visual analyses further validate the significant performance gains from the SAMCD method. Conclusion SAMCD leverages the strong feature extraction capabilities of SAM. By enhancing the encoder, prompt generator, decoder and training strategy, SAMCD accurately capture the complex local details and small targets in the ultrasonic image and improve the automatic segmentation effect of ultrasonic medical images.
Keywords
|