面向全景智齿检测的内卷解耦轻量化网络
摘 要
目的 全口曲面断层片(全景片)需要病人的正确摆位辅以仪器的合理配置而取得合格的成像:以面中线为界,双侧上下颌骨等结构呈左右对称;牙齿的咬合面连线呈缓慢的微笑曲线,各牙齿在全景片上的生理位置是基本固定的。因此,以全景片为代表的口腔医学图像具备固定的前、背景关系和稳定的空间结构,但基于常规卷积的网络因其卷积的空间无关性而对上述空间域的结构信息并不敏感。虽然一些特殊的注意力模块能够引导模型关注特定信息并给予加权,但是它关注的信息常常背离人们的期望,反而降低模型性能;另一方面,注意力作为嵌入式的模块往往会提高计算量和参数量。针对口腔医学图像的结构特性,提出适用于全景智齿检测的基于内卷解耦的YOLO(you only look once)模型。方法 在主干网络中,通过重塑跨阶段分部(cross stage partial,CSP)结构并引入一种具备空间特异性的内卷积方式,使模型优先关注空间域中信息量最大的视觉元素,以此强化模型对空间信息的建模能力;在检测头结构中,提出采用多支路解耦结构克服任务耦合的负面影响,解决内卷算子与YOLO模型的适配性问题,并对各支路的损失函数进行针对性优化。结果 在全景片数据集上的智齿检测的实验结果表明,本文方法的检测性能和模型参数大幅优于近年优秀的单阶段目标检测模型,相较于本文的基线模型,参数量缩减了42.5%,平均精确率提升了6.3%,充分验证了本文模型结构的合理性及对于智齿检测任务的有效性。结论 本文针对口腔医学图像的空间结构性质提出的基于内卷解耦的全景智齿检测方案,具有更强的空间信息建模能力,且降低了参数量成本。
关键词
A lightweight network-involute and decoupled for panoramic wisdom tooth detection
Zeng Yifeng1, Yao Xiao1, Hua Fei2, Wang Peipei2, Gu Min2(1.The College of IoT Engineering, Information Department, Hohai University, Changzhou 213022, China;2.The Third Affiliated Hospital of Soochow University, the First People's Hospital of Changzhou City, Changzhou 213003, China) Abstract
Objective The human dentition-related third molar can be developed and erupted as an impacted tooth. The stomatologists are often required to clarify the current status and potential complications of the disease in terms of panoramic image analyzed impaction level and angle of the mandibular third molar. The panorama is a two-dimensional view,and the artifacts,image overlap and deformation-derived interpretation are vulnerable. The diagnosis and evaluation of diseases are often challenged for manual interpretation of medical images. To get optimal artificial intelligence medical aided diagnosis, we attempt to melt deep learning-based target detection algorithm into panoramic film data. Consensus object image analysis is restricted by complex background,and the obvious texture difference of categories is challenged for vulnerable perception of convolutional neural network in the panoramic image,texture-consistent teeth are closely pre-arranged in related to the integration of fixed front background relationship and certain spatial structure characteristics. Stomatologist is still required to judge the abnormal condition of wisdom teeth in terms of mutual relationship between the spatial position and tooth interaction,and discrimination process of this relationship can be concerned and modeled in terms of visual taskrelevant spatial attention mechanism. Specifically,the attention mechanism is beneficial to suppress redundant channels or pixels to a certain extent. It can be melted into the trunk neural network as a plug-in module,or attached to the top of the trunk to extract high-level semantic relations,for which the bottom layer of convolution features can be preserved. Method The convolution property of neural network is analyzed,and the attention-specific inner convolution operator to the spatial element information can be used optimally. It is melted into you only look once(YOLO)target detection model to improve the performance and reduce the parameters on the premise of ensuring the advantages of YOLO itself. A YOLO-based panoramic wisdom tooth detection scheme is proposed as well. The main contributions are listed as follows:1)an improved cross stage partial(CSP)structure(invoCSP)is proposed,which can optimize the integration of CSP structure and revolution operator,and the YOLO model is introduced derived from its stacking and the related revolution operator. The contextual information can be summarized in a wider spatial range,and different area-oriented weight can be adaptively balanced and allocated in the feature map as well. Finally,spatial modeling ability is improved to fully extract the spatial structure information on the data set;2)we analyze the defect of task coupling in the YOLO model,excavate the potential properties of the involution operator,and summarize the external conditions. To fully decompose and decouple the three specific tasks of the two properties of target detection,a three-branch decoupling structure is constructed in the detection head structure. The applicability of the YOLO model can be improved further. This scheme can be used to alleviate its training process and non-convergence problem of the involution method;3)three branch detection head can avoid sharing weight parameters further for independent optimization. The modified loss function can be used to optimize the tasks on different branches,introduce focal-loss to the confidence loss,and a newly intersection of union(IoU)-loss is applied to the boundary regression of the prediction frame and an advanced classification loss. Result To realize the classification and labeling of mandibular wisdom teeth,a newly panoramic film data set is developed for the winter classification method,which is commonly-used in the diagnosis and treatment of wisdom teeth in oral clinic. It can be randomly disturbed after histogram equalization. Three stomatologists focus on labeling the mandibular wisdom teeth independently for many times under the circumstances of unified diagnostic criteria and labeling rules,and a total of 973 consistent data can be reached. Finally,the panoramic wisdom tooth data set-constructed experimental results demonstrate its potentials for single-stage target detection model in terms of the detection performance and model parameters. Compared to the benched model of YOLOX-tiny,the parameter is lower by 42. 5% and the mAP_50 index is higher by 6. 3 percentage points. In addition,comparative analysis is carried out with nine sort of popular single-stage target detection models as well. The performance of the yooid model is beneficial for same parameter quantity-related optimization. It can not only identify wisdom tooth types accurately,but also return to the prediction frame stably with high IoU and closer to the real label. It is comparable to the detection performance of large model in terms of constraints of smaller parameters,and the highest mAP_50 index can be even achieved. Conclusion To deal with the problem of panorama-based wisdom tooth detection,the convolution property is analyzed in the neural network. The involution operator is adopted,in which specific attention is added to the spatial element information of additional network structure excluded,and it is introduced into the YOLO target detection model feasibly. The performance is improved and the parameters can be reduced on the premise of ensuring the advantages of YOLO. A panoramic wisdom tooth detection network model is facilitated based on involution decoupling. Through the comparison of qualitative and quantitative experiments,it is verified that the proposed model can effectively detect the constructed target object,and the rationality of the design idea of the model can be improved as well. Furthermore,it demonstrates that the decoupled structure can fit the design of involution. We sort the relationship between involution and coupling out,and a multiple-branch decoupled structure can be used to improve the YOLO model-oriented adaptability of involution operator further. The parameters of this model are reduced greatly beyond high efficiency of performance,which is suitable for the application environment of real-time detection. It is predicted that this method proposed is beneficial to realize lightweight application level deployment in preliminary screening and objective reference for Stomatology further.
Keywords
|