面向小样本股骨骨折分型的多视角注意力融合方法
摘 要
目的 股骨粗隆间骨折是老年人最常见的骨折,不同类型的骨折需要不同的治疗方法。计算机图像识别技术可以辅助医生提高诊断准确率。传统的图像特征提取和机器学习方法,无法实现细粒度、高精度的分类,且少见针对3维图像的骨折分型方法。基于深度学习方法,通常需要大量的样本参与训练才能得出较好的分型性能。针对上述问题,本文提出一种面向小样本、多分类的骨折分型方法。方法 将原始CT (computed tomography)分层扫描图像进行3维重建,获取不同视角下的2维图像信息,利用添加注意力机制的多视角深度学习网络融合组合特征,并联合旋转网络获得视角不变特征,最终得到预期分型结果。结果 针对自建训练数据集(5类,每类23个样本),实验在4种3维深度学习网络模型上进行比较。基于注意力机制的多视角融合深度学习方法比传统深度学习模型的准确率提高了25%;基于旋转网络的方法比多视角深度学习方法提高8%。通过对比实验表明,提出的多视角融合深度学习方法大大优于传统基于体素的方法,并且也有利于使网络快速收敛。结论 在骨折分型中,本文提出的添加注意力机制的多视角融合分型方法优于传统基于体素的深度学习方法,具有更高的准确率和更好的性能。
关键词
Multi-view attention fusion method for few-shot femoral fracture classification
Zhang Yadong1, Wang Ling1, Lan Hai2, Zhai Yuqiao2, Cheng Hong1(1.University of Electronic Science and Technology of China, Chengdu 611731, China;2.Clinical Medical College and Affiliated Hospital of Chengdu University, Chengdu 610081, China) Abstract
Objective Femoral intertrochanteric fracture is the most common fracture in the elderly. Each type of fracture requires a specific treatment method. Computer imaging techniques, such as X-ray and computerized tomography (CT), are used to help doctors in clinical diagnosis. Considering the complex fracture types and the large number of patients, missed diagnosis or misdiagnosis is incurred. In recent years, the development of computer image recognition technology has helped doctors improve the diagnostic accuracy. Femoral fractures have two types, namely, Arbeitsgemeinschaftfür Osteosynthesefragen(AO)/Orthopaedic Trauma Association(OTA) and six-types. The classification methods can be divided into traditional machine learning methods and deep learning methods. In traditional machine learning methods, man-made features are used for learning to make classification. However, these methods usually cannot achieve fine-grained and high-precision classification, and only a few fracture classification methods can be used for three-dimensional images. The deep learning method usually needs a large number of samples to participate in training to obtain good performance. To solve the above problems, this paper proposes a fracture classification method for small samples and multiple classification. Method An attention-based multi-view fusion network is proposed, in which a data-fusion strategy is used to improve the feature-fusion performance. Firstly, the original CT layered scanning images are reconstructed to three-dimension, and then two-dimensional images are obtained from different viewpoints. Secondly, a multi-view depth learning network with attention mechanism is used to fuse the different features with different viewpoints. Max-pooling, fully connective layer (FC) and rectified linear unit (ReLU) layers are used for learning the weights of different viewpoints. These layers are used to learn the view attention. The max-pooling operator down-sample the H×W×M original samples' tensor to 1×1×M, which is then down-sampled to 1×1×M/r by the FC layer. The weighted parameters of each viewpoint are obtained using the ReLu and Sigmoid operations. Thirdly, the multiview images are multiplied by the view-weights and work as inputs of convolutional neural network (CNN). The probability that the sample falls into one class is learnt in the CNN. The attention mechanism helps network learning distinctive features. Moreover, the multi-view tensor reduces data dimension, thus improving CNN performance under small data sample size. With the consideration of CT scanning difference, pose changes are observed in 3D reconstructed models. These differences will result in uncertainty learning and reduce the classification performance. Then, a rotation network is used to obtain the view invariant features. RotationNet is defined as a differentiable multi-layer CNN, which has an additional viewpoint variable to learn how to compare with aforementioned multi-view network. The additional viewpoint variable functions to label incorrect view. The final layer of RotationNet is a concatenation of multi-view SoftMax layer, each of which outputs the category likelihood of each image. The category likelihood should be close to one when the estimated is correct. RotationNet only use partial set of multi-view images for classification, making it useful in typical scenarios, where only partial-view images are available. The RotationNet uses 2D CNN as backbone, in which large training sample size is needed. Then, in this paper, transfer learning is processed in the training step to improve the performance on multiple classification. The parameters of RotationNet are pre-trained on ModelNet40. A global parameter fine tuning process is employed on the fracture data in training step considering the difference of ModelNet40 and our fracture data.Result The proposed methods are compared with two three-dimensional deep learning network models, namely, 3D ResNet and original multi-view CNN. Two types of classification, namely, AO and six-type, are used. A total of 23 training samples and 10 testing samples are present in each category. Firstly, the number of viewpoints is analyzed. Experimental results illustrate that the classification performance is improved when the number of viewpoints is changed from 4 to 12. However, the performance fluctuated when viewpoint number is great than 16. The reason is because of similarity between samples, which can be considered as same sample and results to performance reduce. In the following experiments, the number of viewpoints is set to 12. Secondly, the attention mechanism is analyzed. The proposed attention multi-view CNN (MV_att) is compared with original multi-view CNN (MVCNN) on the data-fusion model. The area under curve of our proposed MV_att is improved by approximately 3% on AO classification, which is approximately 5% in average on six-type classification. Thirdly, the performance of the models is analyzed. The accuracy of MV_att is 25% higher than that of MVCNN on AO classification. The pre-training RotationNet is 8% higher than MV_att on the six-type classification. Comparative experiments show that the proposed multiview fusion depth learning method is much better than the traditional voxel-based method, and it is also conducive to the rapid convergence of the network. Conclusion In fracture classification, the multi-view fusion classification method with attention mechanism proposed in this paper has higher accuracy than the traditional voxel depth learning method. The attention mechanism is useful in extracting distinct features. The multi-view data fusion model is useful in reducing the needs of sample size. The transfer learning is useful in improving the performance of the network.
Keywords
|