Indexed by:
Abstract:
Objective Human eyes physiological features are challenged to be captured, which can reflect health, fatigue and emotion of human behaviors. Fatigue phenomenon can be judged according to the state of the patients’ eyes. The state of the in-class students’ eyes can be predicted by instructorsin terms of students’ emotion, psychology and cognitive analyses. Targeted consumers can be recognized through their gaze location when shopping. Camera shot cannot be used to capture the changes in pupil size and orientation in the wild. Meanwhile, there is a lack of eye behavior related dataset with fine landmarks detection and segment similar to the real application scenario. Near-infrared and head-mounted cameras could be used to capture eye images. Light is used to distinguish the iris and pupil, which obtain a high-quality image. Head posture, illumination, occlusion and user-camera distance may affect the quality of image. Therefore, the images collection in the laboratory environment are difficult to apply in the real world. Method An eye region segmentation and landmark detection dataset can resolve the issue of mismatch results between the indoor and outdoor scenarios. Our research focuses on collection and annotation of a new eye region segment and landmark detection dataset (eye segment and landmark detection dataset, ESLD) in constraint of dataset for fine landmark detection and eye region, which contain multiple types of eye. First, facial images are collected. There are three ways to collect images, including the facial images of user when using the computer, images in the public dataset captured by the ordinary camera and the synthesized eye images, respectively. The number of images is developed to 1 386, 804 and 1 600, respectively. Second, eye region is cut out from the original image. Dlib is used to detect landmarks and eye region is segmented according to the labels of the completed face images involved. For an incomplete face images, eye region should be segment artificially. And then, all eye region images are normalized in 256 ×128 pixels. The eye region images are restored in a folder according to the type of acquisitions. Finally, annotators are initially to be trained and manually annotated images labels followed. In order to reduce the label error caused by human behavior factors, each annotator selects four images from each type of image for labeling. An experienced annotator will be checked after the landmarks are labeled and completed. The remaining images can be labeled when the annotate standard is reached. Each landmarks location is saved as json file and labelme is used to segment eye region derived the json file. A total of 2 404 images are obtained. Each image contains 16 landmarks around eyes, 12 landmarks around iris and 12 pupil surrounded landmarks. The segment labels are relevant to sclera, iris, and pupil and skip around eyes. Result Our dataset is classified into training, testing and validation sets by 0. 6 ∶ 0. 2 ∶ 0. 2. Our demonstration evaluates the proposed dataset using deep learning algorithms and provides baseline for each experiment. First, the model is trained by synthesized eye images. An experiment is conducted to recognize whether the eye is real or not. Our analyzed results show that model cannot recognize real and synthesis accurately, which indicate synthesis eye images can be used as training data. And, deep learning-based algorithms are used to eye region segment. Mask region convolutional neural network (Mask R-CNN) with different backbones are used to train the model. It shows that backbones with deep network structure can obtain high segment accuracy under the same training epoch and the mean average precision (mAP) is 0. 965. Finally, Mask R-CNN is modified to landmarks detection task. Euclidean distance is used to test the model and the error is 5. 828. Compared to eye region segment task, it is difficult to detect landmarks due to the small region of the eye. Deep structure is efficient to increase the accuracy of landmarks detection with eye region mask. Conclusion ESLD is focused on multiple types of eye images in a real environment and bridge the gaps in the fine landmarks detection and segmentation in eye region. To study the relationship between eye state and emotion, a deep learning algorithm can be developed further based on combining ESLD with other datasets. © 2022 Editorial and Publishing Board of JIG. All rights reserved.
Keyword:
Reprint Author's Address:
Email:
Source :
Journal of Image and Graphics
ISSN: 1006-8961
Year: 2022
Issue: 8
Volume: 27
Page: 2329-2343
Cited Count:
SCOPUS Cited Count: 1
ESI Highly Cited Papers on the List: 0 Unfold All
WanFang Cited Count:
Chinese Cited Count:
30 Days PV: 8
Affiliated Colleges: