Current Issue Cover
着装场景下双分支网络的人体姿态估计

吕中正1, 刘骊1,2, 付晓东1,2, 刘利军1,2, 黄青松1,2(1.昆明理工大学信息工程与自动化学院, 昆明 650500;2.云南省计算机技术应用重点实验室, 昆明 650500)

摘 要
目的 人体姿态估计旨在识别和定位不同场景图像中的人体关节点并优化关节点定位精度。针对由于服装款式多样、背景干扰和着装姿态多变导致人体姿态估计精度较低的问题,本文以着装场景下时尚街拍图像为例,提出一种着装场景下双分支网络的人体姿态估计方法。方法 对输入图像进行人体检测,得到着装人体区域并分别输入姿态表示分支和着装部位分割分支。姿态表示分支通过在堆叠沙漏网络基础上增加多尺度损失和特征融合输出关节点得分图,解决服装款式多样以及复杂背景对关节点特征提取干扰问题,并基于姿态聚类定义姿态类别损失函数,解决着装姿态视角多变问题;着装部位分割分支通过连接残差网络的浅层特征与深层特征进行特征融合得到着装部位得分图。然后使用着装部位分割结果约束人体关节点定位,解决服装对关节点遮挡问题。最后通过姿态优化得到最终的人体姿态估计结果。结果 在构建的着装图像数据集上验证了本文方法。实验结果表明,姿态表示分支有效提高了人体关节点定位准确率,着装部位分割分支能有效避免着装场景中人体关节点误定位。在结合着装部位分割优化后,人体姿态估计精度提高至92.5%。结论 本文提出的人体姿态估计方法能够有效提高着装场景下的人体姿态估计精度,较好地满足虚拟试穿等实际应用需求。
关键词
Dual branch network for human pose estimation in dressing scene

Lyu Zhongzheng1, Liu Li1,2, Fu Xiaodong1,2, Liu Lijun1,2, Huang Qingsong1,2(1.Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming 650500, China;2.Computer Technology Application Key Laboratory of Yunnan Province, Kunming 650500, China)

Abstract
Objective Human pose estimation aims at the human joints recognition and orientation in a targeted image of different scenes and the joint point positioning accuracy optimization. Current methods of human pose estimation have a good performance in some targeted dressing scenes where the visibility of body joints was constrained by occasional clothes wearing and failed in some complicated dressing scenes like fashion street shot. There are two main difficulties of human pose estimation in the dressing scene which result in the low accuracy of human body joints positioning and human pose estimation. One aspect is that various styles of clothes wearing leads to human body joints partially occluded and various texture and color information caused the failure of human joint point positioning. Another one is that there are various body postures in dressing scene. A method of dual branch network is required for human pose estimation in dressing scene. Method First, human detection is implemented on the input image to obtain the area of dressed human body. The pose representation branch and the dress part segmentation branch are segmented each. Next, to avoid the interference of the joint point feature extraction in the context of the variety of clothing styles and complex background, the multi-scale loss and feature fusion pose representation branch generate the joint point score map based on the stacked hourglass network. To overcome the problem of human pose with different angles of view in the dressing scene, the pose category loss function is harnessed based on pose clustering. Then, the dress part segmentation branch is constructed based on the shallow connection, deep features of the residual network and feature fusion performance based on the targeted label of dressed part to build the dressed part score map. At the end, in order to resolve the clothing occlusion of joints issue, the dress part segmentation result is used to constrain the position of human body joints, and the final human pose estimation is obtained for pose optimization. Result The illustrated method is validated on the constructed image dataset of the dressed people. Our demonstration show that the constructed pose representation branch improves the positioning accuracy of human body joints effectively, especially the introduced pose category loss function improved the robustness of multi-angles human pose estimation. In terms of the optimization integrated with the semantic segmentation of dressed parts, the estimation accuracy of human body pose is improved to 92.5%. Conclusion In order to handle low accuracy of human pose estimation derived from various clothing styles and various human body postures in dressing scene, a dual-branch network for human pose estimation is facilitated in dressing scene. To improve the positioning accuracy of human body joints, we construct pose representation model to fuse global and local features. A pose category loss is melted to improve the robustness of multi-view angles of human pose estimation. We integrate the semantic segmentation of dressed parts to constrain the position of human body joints which improves the accuracy of human body pose estimation in dressing scene effectively. The constructed image dataset of human dresses demonstrates that the proposed method can improve the estimation accuracy of human body pose in dressing scene. The clear estimation ratio of joint points reaches 92.5%. The estimation accuracy of the human pose is still low, especially in the occasion of dresses wear; overcoat and multi-layer clothes cover human body joints seriously. Meanwhile, it is required to improve the algorithm of the positioning accuracy of human body joints when people have bags and other accessories. The accuracy of human pose estimation is improved in multi-oriented dressing scenes further.
Keywords

订阅号|日报