多特征决策融合的音频copy-move篡改检测与定位
张国富1,2,3,4, 肖锐1, 苏兆品1,2,3,4, 廉晨思5, 岳峰1,4(1.合肥工业大学计算机与信息学院, 合肥 230601;2.大数据知识工程教育部重点实验室(合肥工业大学), 合肥 230601;3.智能互联系统安徽省实验室(合肥工业大学), 合肥 230009;4.工业安全应急技术安徽省重点实验室 (合肥工业大学), 合肥 230601;5.安徽省公安厅物证鉴定管理处, 合肥 230000) 摘 要
目的 随着各种功能强大的音频编辑软件的流行,使得不具备专业知识的普通用户也可以轻松随意地对数字音频文件进行编辑甚至是恶意篡改,这给数字音频的鉴真带来了极大挑战。其中,copy-move篡改是将同一段音频中的部分区域复制粘贴到其他部分,从而实现对音频的语义篡改。由于其篡改片段的特性与原始音频文件匹配度极高,导致检测难度极大,已成为音频取证领域的一个研究热点。然而,现有研究大多基于语音端点检测技术,只能检测出整个有声片段是否发生篡改,而无法准确定位篡改的具体位置。为此,本文提出一种基于多特征决策融合的音频copy-move篡改检测与定位方法。方法 首先利用基于谱熵法的语音端点检测技术将音频划分为若干静音段和有声段,并基于能熵比方法进一步对有声段进行字节分割;然后提取每个字节的基音频率特征、颜色自相关图特征和短时能量特征,并利用动态时间规整距离计算任意两个字节在基音频率特征上的相似度,采用余弦距离计算两个字节在颜色自相关图特征上的相似度,利用短时能量和差值计算两个字节在短时能量特征上的相似度;最后基于多特征决策融合准确定位篡改位置。结果 在相关数据集上的对比实验结果表明,本文提出的多特征决策融合方法在精确率和召回率上均优于对比方法,达到了90%以上。在检测的精确率上平均提升了约16%,在召回率上平均提升了约26%。此外,在定位的精准度上平均提升了约45%。而且,在对数据集进行一些常规信号处理攻击后,本文方法仍可以达到94%以上的检测准确率和召回率,且在检测的精确率上平均提升了约16%,在召回率上平均提升了约31%。结论 本文方法不仅具有更高的检测精确率、召回率和定位精准度,而且对常规信号处理攻击也具有更好的鲁棒性。
关键词
Multi-feature decision fused detection and localization method for copy-move forgery of digital audio clips
Zhang Guofu1,2,3,4, Xiao Rui1, Su Zhaopin1,2,3,4, Lian Chensi5, Yue Feng1,4(1.School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;2.Key Laboratory of Knowledge Engineering with Big Data (Hefei University of Technology), Ministry of Education, Hefei 230601, China;3.Intelligent Interconnected Systems Laboratory of Anhui Province (Hefei University of Technology), Hefei 230009, China;4.Anhui Province Key Laboratory of Industry Safety and Emergency Technology (Hefei University of Technology), Hefei 230601, China;5.Institute of Forensic Science, Department of Public Security of Anhui Province, Hefei 230000, China) Abstract
Objective Forensic-oriented digital audio technology has been intensively developing in terms of the growth of audio recordings.Digital audio recordings can be as the evidences for the legal disputes issue of civil litigation in common.However,the original semantic information of the audio recordings can be changed very easily by widely via several of digital audio editing software and their online tutorials.Consequently,audio forensics are challenged of the real or fake issue derived from tampered audio recording behavior.A copy-move forgery can distort the original recordings through audio clip.The source and the target segments in the copy-move forgery are both derived from the same audio recording compared to splicing and synthesized forgeries.Such attributes like amplitude,frequency,length,noise,tone,and even velocity can be well-matched between the forged segments and the recording,especially for the segments of very short duration for utterances.The requirement of blind audio tampering detection has promoted blind audio forensics via the copy-move forgery detection and localization on digital audio recordings.However,most of the existing methods divide the audio recording into very short multiple segments based on voice activity detection (VAD) related techniques.The accuracy of localization and forgery is challenged although the two similar segments can be identified within the recording.We facilitate multi-feature decision fusion method for detecting and localizing the audio copy-move forgeries.Method First,the audio recording is segmented into many voiced and unvoiced parts in terms of spectral-entropy-based VAD technology.Next,all the voiced segments are further split into syllables,each of which contains a Chinese character only according to the energy to spectral entropy ratio.Then,the pitch frequency,color auto-correlogram,and short-time energy features of each syllable are extracted respectively.The similarity of any two syllables on the pitch frequency features is calculated by the dynamic time warping distance.The similarity of the two syllables on the color auto-correlogram features is obtained by the cosine distance,and the similarity of the two syllables on the short-time energy features is generated by the difference of the short-time energy sum,respectively.Finally,audio forgeries are accurately localized on the basis of multi-feature decision fusion and the three similarities mentioned above.In detail,a copy-move forgery has occurred,and the approximate forgery locations are preliminarily determined for any two pending syllables if each similarity of the two syllables cannot meet the requirement of pre-specified threshold.After that,two new syllables are constructed through both of the two forged syllables by one frame.It is calculated by the three similarities of the new syllables compared to the threshold.If each similarity is still less than the threshold,the two syllables are extended by one frame again until one of the three similarities is beyond the corresponding threshold.The phase of two new syllables positions are based on forgery locations exaction only.Result A classical database is used to generate our copy-move forged dataset,which includes 500 authentic recordings and 500 forged recordings.The comparative analyses show that our proposed multi-feature decision fusion method has their potentials in terms of precision and recall of more than 97%.Specifically,the detection precision of the proposed method is improved by roughly 16 percentage points,the recall is improved by about 26 percentage points,and the localization accuracy is improved by more than 45% on average.Additionally,our detection precision and recall can reach more than 94% as well via common signal processing attacks like Gaussian noise addition,low-pass filtering,down-sampling,up-sampling,and MP3 format compression.Moreover,the detection precision is improved by about 16 percentage points,and the recall is improved by about 31 percentage points.Conclusion Our method not only has higher detection precision,recall,and localization accuracy,but also has better robustness against common signal processing attacks.
Keywords
audio forensics copy-move forgery detection and localization multi-feature decision fusion pitch frequency color auto-correlogram short-time energy
|