ProMIS:概率图采样图像增广驱动的弱监督物体检测方法
李笑颜1,2, 阚美娜1,2, 梁浩1,2, 山世光1,2,3(1.中国科学院智能信息处理重点实验室, 北京 100190;2.中国科学院大学计算机科学与技术学院, 北京 100049;3.鹏城实验室, 深圳 518055) 摘 要
目的 弱监督物体检测是一种仅利用图像类别标签训练物体检测器的技术。近年来弱监督物体检测器的精度不断提高,但在如何提升检出物体的完整性、如何从多个同类物体中区分出单一个体的问题上仍面临极大挑战。围绕上述问题,提出了基于物体布局后验概率图进行多物体图像增广的弱监督物体检测方法ProMIS(probability-based multi-object image synthesis)。方法 将检出物体存储到物体候选池,并将候选池中的物体插入到输入图像中,构造带有伪边界框标注的增广图像,进而利用增广后的图像训练弱监督物体检测器。该方法包含图像增广与弱监督物体检测两个相互作用的模块。图像增广模块将候选池中的物体插入一幅输入图像,该过程通过后验概率的估计与采样对插入物体的类别、位置和尺度进行约束,以保证增广图像的合理性;弱监督物体检测模块利用增广后的多物体图像、对应的类别标签、物体伪边界框标签训练物体检测器,并将原始输入图像上检到的高置信度物体储存到物体候选池中。训练过程中,为了避免过拟合,本文在基线算法的基础上增加一个并行的检测分支,即基于增广边界框的检测分支,该分支利用增广得到的伪边界框标注进行训练,原有基线算法的检测分支仍使用图像标签进行训练。测试时,本文方法仅使用基于增广边界框的检测分支产生检测结果。本文提出的增广策略和检测器的分支结构在不同弱监督物体检测器上均适用。结果 在Pascal VOC (pattern analysis,statistical modeling andcomputational learning visual object classes) 2007和Pascal VOC 2012数据集上,将该方法嵌入到多种现有的弱监督物体检测器中,平均精度均值(mean average precision,mAP)平均获得了2.9%和4.2%的提升。结论 本文证明了采用弱监督物体检测伪边界框标签生成的增广图像包含丰富信息,能够辅助弱监督检测器学习物体部件、整体以及多物体簇之间的区别。
关键词
ProMIS: probability-based multi-object image synthesis-relevant weakly supervised object detection method
Li Xiaoyan1,2, Kan Meina1,2, Liang Hao1,2, Shan Shiguang1,2,3(1.Key Laboratory of Intelligent Information Processing, Chinese Academy of Sciences, Beijing 100190, China;2.School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing 100049, China;3.Peng Cheng Laboratory, Shenzhen 518055, China) Abstract
Objective Neural networks based fully supervised object detectors can be an essential way to improve the performance of object detection,and it is more reliable for real-world applications to a certain extent. However,it is still challenging for annotating huge amounts of data. A bounding box-related labor-intensive labeling task is required to be resolved for multiple categories and application scenarios. To meet multiple real-world applications,it is challenging to collect largescale detection training datasets as well. Thus,a weakly supervised object detector is designed for its optimization through image category annotations only. Recent weakly supervised object detectors are focused on the multi-instance learning (MIL)technique. In these methods,object proposals are classified and aggregated into an image classification result,and objects are detected by selecting the bounding box that contributes most to the aggregated image classification results among all object proposals. However,since weakly supervised object detection lacks instance-level annotations,a challenging issue of differentiation needs to be resolved for instance from a part of the instance or a cluster of multiple instances of the same category. For training the object detector,our method proposed is focused on the learning ability to distinguish instances by inserting high confidence-relevant detected objects into an input image and generating augmented images along with pseudo bounding box annotations. However,the naive random augmentation method can not immediately improve the detection performance,owing to the following reasons:1)over-fitting:the generated data is used to train the detection head itself;2)infeasible augmentation:spatial distribution of the generated objects is often quite heterogenous from the real data since the hyper-parameters of the insertion are all sampled from uniform distributions. Method To resolve these issues mentioned above,a probability-based multi-object image synthesis (ProMIS)relevant weakly supervised object detection method is developed in terms of two iterative and interactive modules,namely the image augmentation module and the weakly supervised object detection module. For each training iteration,objects are detected in the original input image with the weakly supervised object detector(to ensure accuracy during the initial training,the detector is pre-trained according to its baseline method),and the highly confident detected objects are stored in an object-pending pool for the latter image augmentation. The image augmentation module inserts one or more objects sampled from the object-pending pool to the input image for an augmented training image with pseudo bounding box annotations. To make the augmented image more feasible,the referenced object category,position,and scale for the insertion are sampled from the detected objectsoriented posterior probability maps in this image. Three kinds of posterior probabilities are illustrated in the ProMIS in charge of describing the category,spacial and scale relations of an object and another referenced object,respectively. First,these posterior probabilities can be estimated online according to the objects detected in the previous training iterations,and the hyper-parameters of the newly inserted objects are assumed to obey these posterior probabilities. Then,the detection training module exploits the augmented image and its pseudo annotations to train the weakly supervised object detector. In the training process,to avoid over-fitting to the detected false positives,a new parallel detection branch is added to the baseline weakly supervised object detection head. The augmented bounding box annotations are only used to guide the newly added branch,while the original weakly supervised detection head is employed during the generation of the augmented data and it is trained on the basis of image-level labels only. In the inference process,only the added branch trained with the augmented annotation is kept for generating the testing results,which keeps the efficiency of the weakly supervised object detector in inference. The above image augmentation module and the weakly supervised object detection module can be used iteratively and interactively,and the weakly supervised object detector is facilitated to learn the ability for distinguishing instances steadily. The proposed ProMIS is an online augmentation method and does not require any additional images or annotations except the original weakly supervised detection training data. In addition,since the proposed approach is independent of the selection of the weakly supervised object detector,the proposed augmentation paradigm is generalized for all detector architectures. Result In the experiments,the effectiveness of the proposed parallel detection branch and the posterior probability maps is verified,and they improve the naive random augmentation method by 5. 2% and 2. 2%,respectively. The proposed ProMIS approach is applied to multiple previous weakly supervised object detectors (including online instance classifier refinement (OICR),segmentation-detection collaborative network (SDCN),and online instance classifier refinement with deep residual network(OICR-DRN)). Compared to these baseline methods,it achieves an average of 2. 9% and 4. 2% improvements on the Pascal VOC(pattern analysis,statistical modeling and computational learning visual object classes)2007 and the Pascal VOC 2012 datasets,respectively. Furthermore,ablation analysis is carried out as well,and it is found that the proposed ProMIS can decrease the error mode of the ground-truth in the hypothesis and the hypothesis in the ground-truth. Conclusion It is demonstrated that ProMIS make fewer mistakes when distinguishing instances from its parts or multiple instances of the same category.
Keywords
weakly supervised object detection multi-object data augmentation image synthesis probability map sampling posterior probability estimation
|