医学影像中的生成技术
潘永生, 马豪杰, 夏勇, 张艳宁(西北工业大学) 摘 要
医学影像是一种利用各种成像技术来捕捉人体内部结构和功能的医学诊断方法。这些技术可以提供关于人体解剖、生理和病理状态的视觉信息,在疾病诊断、治疗和预后预测中发挥着重要的作用。由于不同类型或者子类型的医学影像反应患者身体的不同信息,在医疗诊断时往往需要多种不同类型或者子类型的医学影像来获取更加全面的信息从而提高诊断准确率。然而在现实生活中,多模态影像数据获取面临着采集时间长、费用高、可能增加辐射剂量等困难。因此,人们期待能够使用图像处理技术进行跨模态医学影像合成,即使用某一种或一些模态的医学影像去生成另一种或一些模态的医学影像。跨模态医学影像合成虽然能为多模态影像诊断带来便利,但也存在一些技术挑战。例如合成影像和真实影像在诊断性能上具有明显的差异从而导致合成影像的临床失效问题,隐私和伦理问题会导致高质量多模态医学影像数据获取成本高的问题。同时,由于不同模态的影像数据在分辨率、对比度和图像质量上存在一定的差异,这种差异会影像生成模型在生成过程中的一致性,如何解决不同模态之间的数据不一致性也是跨模态医学影像合成所需要面临的挑战。研究者们大多从模型本身入手,通过提高模型的表示能力或者设计针对具体任务的约束条件来提高合成影像的质量,所开发的跨模态医学影像合成技术已应用于影像采集、重建、配准、分割、检测、诊断等环节,给许多问题带来了新的解决思路和方法。本文主要介绍医学图像领域中跨模态图像合成技术和跨模态医学影像合成的应用。
关键词
Application of Content Generation in Medical Images
Pan Yongsheng, Ma Haojie, Xia Yong, Zhang Yanning(Northwestern Polytechnical University) Abstract
Medical imaging is a crucial tool for medical actions that utilizes a variety of imaging techniques to capture the internal structure and function of the human body. Common types of medical images include Magnetic Resonance Imaging (MRI), Computed Tomography (CT), Positron Emission Tomography (PET), plain X-rays, optical imaging, etc. The information obtained from these images varies due to differences in imaging principles. For example, MRI uses a strong magnetic field and radio waves to obtain images of the inside of the body and provides good information about soft tissues, CT uses X-rays and computerized processing to create images of cross-sections of internal body structures and is primarily used to image high-electron-density tissues (e.g., bone) but provide a little of soft-tissue contrast, and PET uses tracers labeled by radioisotopes to observe biological processes and functional activities within the body to image specific biological functions. At the same time, depending on the differences in imaging parameters and tracers, medical images of the same imaging type may also differ from different subtypes, such as T1-weighted, T2-weighted MRI sequences, FDG-PET and Aβ-PET, etc. These medical imaging techniques provide visual information about the anatomical, physiological, and pathological states of the human body and play an important role in disease diagnosis, treatment, and prognosis prediction. Medical images of the same type or subtype are referred to as single modality, and medical images that also contain different modalities are referred to as multiple modalities. Since different types or sub-types of medical images respond to different information about the patient's body, multiple types/sub-types of medical images are often acquired to obtain more comprehensive information to improve diagnostic accuracy. However, multi-modal image data acquisition faces difficulties such as long acquisition time, high cost, and possible increase in radiation dose. Therefore, it is expected to use generative techniques for cross-modal medical image synthesis, i.e., using medical images of one or some modalities to generate medical images of another or some other modalities. Although cross-modal medical image synthesis can facilitate multi-modal image diagnosis, there are some technical challenges. For example, due to different imaging principles of various imaging modalities, some information that can be captured in the target modality does not exist in the source modality. In this case, synthesized images of the target modality still does not have such information, thus making the synthesized images and real images have obvious differences in diagnostic performance and leading to the problem of clinical failure. At the same time, privacy and ethical issues also contribute to the high cost of acquiring high-quality multimodal medical image data and the problem of missing data in cross-modal medical image synthesis. In addition, as there are varieties in resolution, contrast and image quality between different modalities, such differences will be the consistency of the image generation model in the generation process, how to solve the data inconsistency between different modalities is also a challenge for cross-modal medical image synthesis. The computational complexity and generalization ability of the model also need to be taken into account, as cross-modal medical image synthesis often requires complex models and a lot of computational resources, which may limit the usefulness and scalability of cross-modal medical image synthesis methods. In addition to the training data that the model has already seen, it should also be considered whether the model is able to whether it can have good performance on new or other different datasets. Most of the researchers start from the model itself and improve the quality of the synthesized images by improving the representation ability of the model or designing task-specific constraints, and the developed cross-modal medical image synthesis techniques have been applied to image acquisition, reconstruction, alignment, segmentation, detection, diagnosis, etc., which bring new ideas and methods to solve many problems. This paper focuses on cross-modal image synthesis techniques and applications in the field of medical imaging. We will introduce existing cross-modal medical image synthesis techniques from three aspects: traditional synthesis methods, deep learning-based synthesis methods, and task-driven synthesis methods. Traditional synthesis methods usually divide the image into multiple small blocks and encode each block into a representation vector, by establishing a mapping between the paired block representation vectors of different modalities, and then generating the corresponding target modality block based on the encoding of the source modality block. The random forest-based approach treats image synthesis as a regression problem, assuming that the value of the target modal block or its centroid/central region is the dependent variable of the source modal block and this relationship can be obtained through a regression model. Dictionary learning-based methods assume that there exists a dictionary for each modality, and each image block can be obtained from a sparse representation of the elements in the dictionary, and the image blocks corresponding to different modalities have the same dictionary encoding. Compared with traditional methods, deep learning-based cross-modal image synthesis methods can directly use large-scale parametric models to build mappings from source modal images to target modal images in an end-to-end manner, and automatically extract the representation features of an image or an image block in a data-driven manner without manually design the representation features. Due to its ease of implementation and superior performance, deep learning-based cross-modal image synthesis techniques have now dominated the direction. In this paper, we introduce them from simple CNN-based approach, en(de)coder network-based approach, generative adversarial network approach and diffusion model-based approach. Task-oriented cross-modal image synthesis methods take into account the fact that the synthesis task has a specific task bias, and form a task-specific bias by adding a task-related design on the basis of a generalized technique, so that the synthesized image preserves more information that contributes to the task, and achieves a performance enhancement on the specific task. Such synthesis methods are presented in three categories: task-oriented biases, biases formed through network models, and image synthesis embedded in task models. Finally, we present the application scenarios of cross-modal medical image synthesis techniques and their application under their typical advantageous tasks.
Keywords
|