基于深度学习的单视图三维物体重建研究综述
刘草1, 曹婷2, 康文雄1, 蒋朝辉3, 阳春华3, 桂卫华3, 梁骁俊2(1.华南理工大学;2.鹏城实验室;3.中南大学) 摘 要
从单个视图恢复物体三维结构信息是计算机视觉领域的重要课题,在工业生产、医疗诊断、虚拟现实等领域发挥重要作用。传统单视图三维物体重建方法需要结合几何模板和几何假设以完成特定场景对象的三维重建任务。而当前基于深度学习的单视图三维物体重建方法通过数据驱动的方式,在重建对象适用范围和重建模型鲁棒性等方面取得进展。本文首先讨论近年来单视图三维物体重建领域常用的数据集与评价指标。然后围绕基于深度学习的单视图三维物体重建领域,对有监督学习单视图三维物体重建、无监督学习单视图三维物体重建和半监督学习单视图三维物体重建等相关研究工作进行系统性的分析和总结。最后,对基于深度学习的单视图三维物体重建方法未解决难题进行总结,并展望未来可能的发展趋势与关键技术。
关键词
Single-view 3D object reconstruction based on deep learning: A survey
LIUCAO, Cao Ting1, Kang Wenxiong2, Jiang Zhaokui3, Yang Chunhua3, Gui Weihua3, Liang Xiaojun1(1.Peng Cheng Laboratory;2.South China University of Technology;3.Central South University) Abstract
Single-view 3D (Three-dimensional) object reconstruction seeks to leverage the 2D (Two-dimensional) structure of a single-view image to reconstruct the 3D shape of an object, facilitating subsequent tasks such as 3D object detection, 3D object recognition and 3D semantic segmentation. In recent years, single-view 3D object reconstruction has emerged as a pivotal topic in computer vision, with wide-ranging applications in industrial production, medical diagnostics, virtual reality, and other fields. The traditional single-view 3D object reconstruction methods rely on a combination of geometric templates and geometric assumptions to complete the 3D reconstruction task of specific scene objects. However, the traditional methods based on geometric templates are tailored to specific objects, limiting their generality and scalability. The traditional methods based on geometric assumptions require strong prior conditions for the object, which limits the reconstruction quality of different changing scenes. The current single-view 3D object reconstruction methods based on deep learning have made significant progress in terms of the applicability of reconstructed objects and the robustness of reconstructed models through data-driven approaches. To further understand the current development of single-view 3D object reconstruction methods based on deep learning, this paper systematically analyzes and summarizes three aspects: commonly used datasets and evaluation indicators, method classification and improvement innovation, problem challenges, and development trends in the field of single-view 3D object reconstruction.
This paper first focuses on the commonly used datasets and evaluation indicators in the field of single-view 3D object reconstruction. The datasets is the foundation of 3D reconstruction methods based on deep learning and can be divided into three categories: RGBD (Red-Green-Blue-Depth) datasets, synthetic datasets, and real scene datasets. The RGBD datasets contain object depth information, which is commonly used for algorithm testing and evaluation; The synthetic datasets contain large-scale rendering object images and 3D shape data, which are commonly used for algorithm training and evaluation; The real scene datasets contain a limited number of real object images and 3D shape, which are commonly used for algorithm evaluation. The evaluation indicators can quantitatively demonstrate algorithm performance, mainly including distance evaluation indicators and classification evaluation indicators. The distance evaluation indicators are mainly used to evaluate the shape distance between the reconstructed 3D model and the ground truth 3D model. The smaller the value, the closer the overall shape of the reconstructed 3D model is to the ground truth 3D model. The classification evaluation indicators are mainly used to evaluate the accuracy of the 3D shape classification of each point in the 3D space. The larger the value, the more accurate the reconstructed 3D model will be.
Then this paper analyzes the field of single-view 3D object reconstruction based on deep learning and systematically summarizes the research work related to supervised learning single-view 3D object reconstruction, unsupervised learning single-view 3D object reconstruction, and semi-supervised learning single-view 3D object reconstruction. The supervised learning single view 3D object reconstruction methods mainly focus on the issue of reconstruction resolution in the early stage. With the improvement of 3D representations, especially the application of implicit 3D representation, high-resolution reconstruction of object details has become possible; Subsequent works improve and innovate various aspects such as input image, encoding and decoding, prior knowledge, and general structure to further solve reconstruction problems such as unknown perspectives, key details, and generalized shapes. The unsupervised learning single view 3D object reconstruction methods mainly focus on improving the rendering process in the early stage, laying the foundation for unsupervised learning. Subsequent works improve and innovate from the perspectives of rendering image quantity, image feature attributes, and additional prior knowledge, which can further solve the problems of lighting interference, background interference, and so on. The semi-supervised learning single view 3D object reconstruction methods are mainly divided into 2D labeled data-based methods and 3D labeled data-based methods. The former proposes a small sample data training paradigm and a general data training paradigm to overcome challenges such as difficulty in 3D annotation, deviation, and inconsistency between annotation data and test data; The latter enhances the robust generalization performance of reconstruction through semantic and perspective information. The above three learning methods have their own advantages and disadvantages in terms of technical frameworks. Supervised learning methods utilize 3D labeled data for learning and reconstruction, resulting in high reconstruction quality, but are limited by the high cost of data annotation; Unsupervised learning methods can directly use 2D images to learn and reconstruct, effectively reducing training costs, but the reconstruction quality is not stable enough; The semi-supervised learning methods propose a paradigm for joint learning of labeled and unlabeled data to address the problems of high data annotation cost and unstable reconstruction quality, which combine the advantages of the two methods mentioned above and have been widely studied.
In addition, this paper summarizes the unresolved challenges from the perspectives of data, training paradigms, evaluation metrics, and reconstruction performance; Proposes possible future development trends and key technologies of single-view 3D object reconstruction methods based on deep learning. For the difficult data collection problem of wild objects, it is necessary to study how to use the Internet object image data to build datasets and develop efficient interactive annotation tools, which can reduce data collection costs and annotation costs. For the insufficient learning problem of local object structures, it is necessary to study the training paradigm guided by prior knowledge of local object structures, which can enhance the accuracy and reliability of single-view 3D object reconstruction. For the limited reconstruction performance problem of few-shot 3D annotated data, it is necessary to design the optimal combination of different tasks and develop multi-task learning methods, which can obtain more effective semantic information of objects and supplement effective object reconstruction supervision information; For the neglected local structure assessment problem of the exiting evaluation indicators, it is necessary to design reconstruction evaluation indicators that focus on the reconstruction results of local structures, which can further guide high-precision reconstruction optimization; For the long training optimization cycles and limited objects categories problem of the existing methods, it is necessary to study 3D foundation models that achieve universal category object shape reconstruction, which can promote the development and application of single view 3D object reconstruction methods.
Keywords
|