Current Issue Cover
多特征融合的敦煌古籍残片自动缀合

郑玉彤, 李雪龙, 殷梓轩, 高歌, 翁彧(中央民族大学民族语言智能分析与安全治理教育部重点实验室, 北京 100081)

摘 要
目的 敦煌遗书作为敦煌学研究的根基,是华夏多元民族弥足珍贵的文化遗产。现存的敦煌遗书大多为残片残卷,给整理和研究带来了极大的困难。而人工缀残可谓至难,费时费力,对研究者的要求极高。随着计算机技术和计算机图形学的发展,残片拼接技术也开始进入数字化时代。为此,本文提出基于分层模型的数字图像缀合方法。方法 构建了一个古籍残片数据集。在流程设计上借鉴专家缀合的实践经验,融入专家知识,对碎片数字图像进行预处理。在碴口特征匹配的基础上,融合多种缀合线索,建立了包含物理层、结构层和语义层3层特征的分层模型,从低层次到高层次对匹配结果进行评估打分,完成两阶段的全自动缀合。结果 为了验证提出方法的有效性,在由31张可拼接碎片(11组)和225张孤片组成的256张碎片数据集上进行实验。结果表明,本文方法能够完成其中8组碎片的完整缀合,2组不完整缀合,并找出218张孤片。通过计算,完整匹配准确率为95.76%,不完整匹配准确率为95.70%,缀合准确率都达到了95%。与现有类似任务的3种方法相比,准确率均有明显提升。结论 本文提出的分层模型融合了多方面特征,能有效完成古籍残片缀合任务,提升研究人员的缀残效率。
关键词
Multi-feature fusion based automatic reconstruction in related to Chinese ancient manuscript fragments of Dunhuang

Zheng Yutong, Li Xuelong, Yin Zixuan, Gao Ge, Weng Yu(Key Laboratory of Ethnic Language Intelligent Analysis and Security Governance of MOE, Minzu University of China, Beijing 100081, China)

Abstract
Objective The Dunhuang manuscripts are evident for cultural heritage researches of China. Most of preserved manuscripts are restricted of its age-derived fragments and remnants and challenged for their collation and contexts. However,artificial reconstruction is time consuming and difficult to be developed. The emerging computer graphics-derived computer-aided virtual recovery technology has been facilitating in the context of high speed,easy to use and accuracy. Method We develop a model-hierarchical digital image reconstruction method. First,a dataset of ancient Dunhuang manuscript fragments is constructed. Second,expertise-relevant digital images of the fragments are pre-processed to assist in the rationalization of fragment features and establishment of a plane for the reconstructing process. Moreover,a three layers model is composed of physical,structural and semantic features via fusing multiple collocation cues. For the physical layer,grey-scale feature similarity measures are based on Jaccard correlation coefficients. For the structural layer,geometric contour matching is based on Freeman coding. For the semantic layer,character column spacing consistency features are based on grey-scale fluctuations. The whole reconstruction process is combined with two matching aspects of local and global contexts. The key to the local matching is to determine whether the two pieces match or not,while the vector similarity calculations are performed on the feature descriptors. The local matching results are evaluated and scored by reasonable thresholds between low and the high level. To realize the whole automation process,global matching strategy is implemented in terms of the Hannotta model,and the two aspect of fully automated reconstruction is performed. Result To verify the effectiveness of the proposed method,experiments are carried out on a 256-fragments dataset,which consists of 31 splinterable fragments(which can be reconstructed in 11 groups)and 225 orphaned fragments. The results analysis illustrates that 8 groups of fragments are fully matched,2 groups are partially matched,and 218 orphaned fragments are identified as well. The accuracy of completed matching is 95. 76% while incomplete matching is 95. 70%. Both of their accuracies can be optimized and reached to 95%. To be more specific,each of partial accuracy are reached to 20. 62%,63. 44% and 23. 43%,and the improvement in complete accuracy of each are 39. 85%,68. 09% and 23. 33%. Conclusion The layered model combined with high-speed computing performance of the computer can incorporate multiple features and complete the reconstruction of ancient manuscript fragments effectively. The potential virtual reconstruction is beneficial for secondary damage to the fragile fragments,as well as some irreversible operations. Furthermore,the reconstructed results can provide an important basis for subsequent physical splicing,which can greatly enhance the efficiency of the artificial reconstruction.
Keywords

订阅号|日报