基于知识蒸馏与互信息的多模态MRI疾病预后
魏然1, 戚晓明1, 何宇霆1, 江升1, 钱雯2, 徐怡2, 祝因苏3, Pascal Haigron4, 舒华忠1, 杨冠羽1(1.新一代人工智能技术与交叉应用教育部重点实验室(东南大学);2.南京医科大学第一附属医院放射科;3.南京医科大学附属肿瘤医院(江苏省肿瘤医院,江苏省肿瘤防治研究所)影像中心CT室,江苏省南京市玄武区百子亭路42号,邮编210009;4.The Univ Rennes, Inserm, LTSI - UMR1099, Rennes) 摘 要
目的 多模态心脏磁共振(Cardiac Magnetic Resonance,CMR)图像预测非缺血性扩张型心肌病(Non-Ischemic Dilated Cardio Myopathy,NIDCM)的预后对临床中心力衰竭或心源性猝死等不同应用中发挥着重要作用。由于各个模态CMR图像对同一疾病的感兴趣区域(Regions Of Interest,ROI)不同,使得不同模态图像间的信息互补性和相关性关系复杂,导致多模态CMR图像在对NIDCM预后时表征困难。同时由于预后任务标注困难,可用于训练预后模型的有标注数据规模小,导致模型容易陷入局部最优。针对这两点挑战,提出了一种基于混合匹配蒸馏与对比互信息估计的模型,用于小样本上的多模态CMR图像对NIDCM预后。方法 本文的预后模型有两种不同的设计,解决深度学习网络中多模态CMR图像的表征困难和模型容易陷入局部最优的问题。首先将不同模态CMR图像组合为不同的模态对,并提取对应的图像特征。由于不同模态对的预后目标一致而图像特征分布之间存在差异,因此设计一种混合匹配蒸馏网络,利用逻辑分布一致性将不同图像特征分布关联匹配,以此约束深度学习网络中多模态特征的提取和引导联合表征。然后在不同模态对之间设计一种互信息的对比学习策略,从而估计出多模态分布上的潜在的分类边界,以此作为预后模型的正则化项,避免模型在有限的数据上陷入局部最优。结果 实验在一个NIDCM临床数据集上分别与最新的4种方法进行了比较。F1值和Acc值达到81.25%和85.61%;为了验证模型的泛化性,在一个脑肿瘤公共数据集上也分别与最新的4种方法进行了比较,F1值和Acc值达到85.07%和87.72%。结论 本文所提出的基于混合匹配蒸馏与对比互信息估计的预后网络模型对多模态CMR图像进行了有效表征,同时利用多模态之间的潜在互信息增强深度学习模型在小样本场景下的模型优化,最终使得多模态CMR图像的对NIDCM预后结果更加准确。
关键词
Knowledge distillation and mutual information based disease prognosis of multimodal MRI
Wei Ran, Qi Xiaoming, heyuting1, Jiang Shen, qianwen2, Xu Yi, Zhu Yinsu3, Pascal Haigron, Shu Huazhong, Yang Guanyu4(1.The Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University);2.Dept. of Radiology, the First Affiliated Hospital of Nanjing Medical University, Nanjing 210029, China;3.Department of Radiology,The Affiliated Cancer Hospital of Nanjing Medical University, Jiangsu Cancer Hospital, Jiangsu Institute of Cancer Research, 42 Baiziting, Nanjing, 210009, China;4.The Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education) Abstract
Objective Non-Ischemic Dilated Cardio Myopathy (NIDCM) is a heart condition that can lead to severe outcomes such as heart failure or sudden cardiac death. Accurate prognosis of NIDCM plays a crucial role in the early diagnosis and effective treatment of patients suffering from this disease. Multi-modal Cardiac Magnetic Resonance (CMR) imaging, which captures heart data from different perspectives, is essential for prognosis. Each CMR modality provides unique and complementary information about the heart’s structure and function. However, leveraging these multi-modal data sources to predict prognosis poses two main challenges. First, the Regions Of Interest (ROIs) differ among CMR modalities even when imaging the same disease, which makes combining them into a cohesive predictive model complex. The distinct characteristics and distribution of data from different modalities create difficulties in capturing comprehensive information about the prognosis of NIDCM. Secondly, the challenge of limited labeled training data exacerbates the problem. Due to the difficulty in labeling such data, the available dataset is small, which increases the risk of a deep learning model falling into local optima. This can hinder the model’s ability to generalize well and achieve good predictive performance. Therefore, there is a need for a specialized approach that can address both the complexity of multi-modal data representation and the limitations of small sample sizes. Method To overcome these challenges, we propose a novel model based on hybrid matching distillation and contrastive mutual information estimation. The design of this model focuses on two aspects: improving the representation of multi-modal CMR images and preventing the model from falling into local optima due to the limited size of the training data. The first component of our method involves combining different CMR modalities into pairs. Each pair is treated as a unique data source, and image features corresponding to these modality pairs are extracted. Since the prognosis objective is consistent across modalities but their feature distributions vary, a hybrid matching distillation network is employed. This network enforces logical distribution consistency between the modalities. It learns to associate and match different image feature distributions across modalities by leveraging the inherent consistency in prognosis objectives. This matching constrains the extraction of features from each modality, ensuring that the deep learning network can jointly represent multi-modal features. As a result, the network captures the complementary information from the various modalities more effectively, leading to better predictive performance. The second component is a mutual information contrastive learning strategy. This strategy is applied to estimate potential classification boundaries across the multi-modal feature distribution. Essentially, this step introduces a regularization term into the prognosis model, which prevents the model from falling into a local optimum during training with a small sample size. The contrastive learning strategy aims to maximize the mutual information between modalities while learning meaningful feature representations. By estimating the classification boundaries, the model is better able to discern the subtle differences in the feature space, which enhances its ability to generalize from limited data. This strategy not only regularizes the learning process but also ensures that the model captures the most informative aspects of the multi-modal data. The hybrid matching distillation and contrastive mutual information estimation components work together to build a robust prognosis model for NIDCM. By utilizing logical consistency across modalities and mutual information between them, the model achieves improved feature representation and avoids overfitting to the small sample size. Result To evaluate the performance of the proposed model, experiments were conducted using a clinical dataset of NIDCM patients. The results of these experiments were compared to six state-of-the-art methods to assess the model’s effectiveness. The performance was evaluated using two key metrics: F1 score and accuracy (Acc). F1 score is particularly useful in assessing the balance between precision and recall. And Acc is important in assessing the overall correctness of predictions. The proposed model achieved an F1 score of 81.25% and an accuracy of 85.61% on the NIDCM dataset. These results demonstrated a significant improvement over the baseline models, highlighting the effectiveness of the hybrid matching distillation and contrastive mutual information estimation techniques in handling multi-modal CMR images for prognosis prediction. The use of these two complementary approaches allowed the model to better utilize the limited training data and capture the complex correlations between the different modalities. To further validate the generalization capability of the model, an additional experiment was conducted on a public dataset related to brain tumors. This dataset also featured multi-modal medical images, and it allowed us to verify whether the proposed method could be applied beyond the NIDCM domain. The model achieved an F1 score of 85.07% and an accuracy of 87.72% on this dataset, outperforming the four baseline methods once again. These results demonstrate that the proposed model can generalize well to other medical imaging tasks beyond NIDCM, making it a versatile and effective tool for prognosis prediction. Conclusion The proposed prognosis network model based on hybrid matching distillation and contrastive mutual information estimation effectively addresses the two major challenges in using multi-modal CMR images for NIDCM prognosis. The hybrid matching distillation component ensures that the model learns to represent multi-modal data by leveraging logical consistency across different CMR modalities. This improves the model's ability to capture the complementarity between modalities. The contrastive mutual information estimation component provides regularization by estimating classification boundaries, preventing the model from overfitting to small sample sizes. As demonstrated by the experiments, this approach significantly improves the accuracy and F1 score of the prognosis model, outperforming several state-of-the-art methods. The model’s generalization capabilities were further confirmed through its successful application to a brain tumor dataset, proving its versatility across various medical imaging tasks. In addition to achieving high predictive performance, this method also demonstrates the potential of deep learning in handling complex medical imaging problems where data scarcity is a concern. By combining hybrid matching distillation with contrastive mutual information estimation, the model is capable of handling the intricate relationships between different modalities and producing robust, reliable predictions even with limited labeled data. Future work could explore the extension of this method to other forms of multi-modal medical imaging beyond CMR, such as combining MRI and CT scans for comprehensive diagnosis. Additionally, further research into improving the efficiency of training and reducing the computational complexity of the model would make it more accessible for widespread clinical use. This research opens up new avenues for multi-modal prognosis models and sets a foundation for future innovations in medical image analysis.
Keywords
|