Current Issue Cover
自然光普通摄像头的眼部分割及特征点定位数据集ESLD
摘 要
目的 眼部状态的变化可以作为反映用户真实心理状态及情感变化的依据。由于眼部区域面积较小,瞳孔与虹膜颜色接近,在自然光下利用普通摄像头捕捉瞳孔大小以及位置的变化信息是当前一项具有较大挑战的任务。同时,与现实应用环境类似的具有精细定位和分割信息的眼部结构数据集的欠缺也是制约该领域研究发展的原因之一。针对以上问题,本文利用在普通摄像头场景下采集眼部图像数据,捕捉瞳孔的变化信息并建立了一个眼部图像分割及特征点定位数据集(eye segment and landmark detection dataset,ESLD)。方法 收集、标注并公开发布一个包含多种眼部类型的图像数据集ESLD。采用3种方式采集图像:1)采集用户使用电脑时的面部图像;2)收集已经公开的数据集中满足在自然光下使用普通摄像机条件时采集到的面部图像;3)基于公开软件UnityEye合成的眼部图像。3种采集方式可分别得到1 386幅、804幅和1 600幅眼部图像。得到原始图像后,在原始图像中分割出眼部区域,将不同尺寸的眼部图像归一化为256×128像素。最后对眼部图像的特征点进行人工标记和眼部结构分割。结果 ESLD数据集包含多种类型的眼部图像,可满足研究人员的不同需求。因为实际采集和从公开数据集中获取真实眼部图像十分困难,所以本文利用UnityEye生成眼部图像以改善训练数据量少的问题。实验结果表明,合成的眼部图像可以有效地弥补数据量缺少的问题,F1值可达0.551。利用深度学习方法分别提供了眼部特征点定位和眼部结构分割任务的基线。采用ResNet101作为特征提取网络情况下,眼部特征点定位的误差为5.828,眼部结构分割的mAP (mean average precision)可达0.965。结论 ESLD数据集可为研究人员通过眼部图像研究用户情感变化以及心理状态提供数据支持。
关键词
ESLD: eyes segment and landmark detection in the wild

Zhang Junjie, Sun Guangmin, Zheng Kun, Li Yu, Fu Xiaohui, Ci Kangyi, Shen Junjie, Meng Fanchao, Kong Jiangping, Zhang Yue(Beijing University of Technology, Beijing 100024, China)

Abstract
Objective Human eyes physiological features are challenged to be captured, which can reflect health, fatigue and emotion of human behaviors. Fatigue phenomenon can be judged according to the state of the patients' eyes. The state of the in-class students' eyes can be predicted by instructorsin terms of students' emotion, psychology and cognitive analyses. Targeted consumers can be recognized through their gaze location when shopping. Camera shot cannot be used to capture the changes in pupil size and orientation in the wild. Meanwhile, there is a lack of eye behavior related dataset with fine landmarks detection and segment similar to the real application scenario. Near-infrared and head-mounted cameras could be used to capture eye images. Light is used to distinguish the iris and pupil, which obtain a high-quality image. Head posture, illumination, occlusion and user-camera distance may affect the quality of image. Therefore, the images collection in the laboratory environment are difficult to apply in the real world. Method An eye region segmentation and landmark detection dataset can resolve the issue of mismatch results between the indoor and outdoor scenarios. Our research focuses on collection and annotation of a new eye region segment and landmark detection dataset (eye segment and landmark detection dataset, ESLD) in constraint of dataset for fine landmark detection and eye region, which contain multiple types of eye. First, facial images are collected. There are three ways to collect images, including the facial images of user when using the computer, images in the public dataset captured by the ordinary camera and the synthesized eye images, respectively. The number of images is developed to 1 386, 804 and 1 600, respectively. Second, eye region is cut out from the original image. Dlib is used to detect landmarks and eye region is segmented according to the labels of the completed face images involved. For an incomplete face images, eye region should be segment artificially. And then, all eye region images are normalized in 256×128 pixels. The eye region images are restored in a folder according to the type of acquisitions. Finally, annotators are initially to be trained and manually annotated images labels followed. In order to reduce the label error caused by human behavior factors, each annotator selects four images from each type of image for labeling. An experienced annotator will be checked after the landmarks are labeled and completed. The remaining images can be labeled when the annotate standard is reached. Each landmarks location is saved as json file and labelme is used to segment eye region derived the json file. A total of 2 404 images are obtained. Each image contains 16 landmarks around eyes, 12 landmarks around iris and 12 pupil surrounded landmarks. The segment labels are relevant to sclera, iris, and pupil and skip around eyes. Result Our dataset is classified into training, testing and validation sets by 0.6:0.2:0.2. Our demonstration evaluates the proposed dataset using deep learning algorithms and provides baseline for each experiment. First, the model is trained by synthesized eye images. An experiment is conducted to recognize whether the eye is real or not. Our analyzed results show that model cannot recognize real and synthesis accurately, which indicate synthesis eye images can be used as training data. And, deep learning-based algorithms are used to eye region segment. Mask region convolutional neural network(Mask R-CNN) with different backbones are used to train the model. It shows that backbones with deep network structure can obtain high segment accuracy under the same training epoch and the mean average precision (mAP) is 0.965. Finally, Mask R-CNN is modified to landmarks detection task. Euclidean distance is used to test the model and the error is 5.828. Compared to eye region segment task, it is difficult to detect landmarks due to the small region of the eye. Deep structure is efficient to increase the accuracy of landmarks detection with eye region mask. Conclusion ESLD is focused on multiple types of eye images in a real environment and bridge the gaps in the fine landmarks detection and segmentation in eye region. To study the relationship between eye state and emotion, a deep learning algorithm can be developed further based on combining ESLD with other datasets.
Keywords

订阅号|日报