Face Tracking and Pose Estimation

Both face tracking and face pose estimation play key roles for human-robot interaction which can be used as essential preprocessing steps for robust face recognition or facial expression recognition.

Tracking faces in uncontrolled environments still remains a challenging task because the face image as well as the background changes quickly over time and the face often moves through different illumination conditions. We propose an algorithm of tracking faces based on the combination of an adaptive correlation filter and a Viola-Jones face detection [1] to adapt to face changes of rotation, occlusion and scales as well as adapt to complex changes of background and illumination.

The tracking algorithm we propose is based on the combination of a tracking method using the MOSSE filter and a Viola-Jones face detection [1]. The face position, which is located by the face detector [2], is the initial position of the face tracker. The face is tracked by the search window in the center. By correlating the filter over this window, we can find the new position of the face in the current frame. In order to compute the correlation operation, all the search images and filter are transformed to Fourier space by using a Fast Fourier Transform. The MOSSE tracker must be updated online or be corrected by the face detector in order to quickly adapt to the appearance changes of the face.


Figure 1: Examples of face tracking through poses. Our face tracker is marked by the red rectangle and the original MOSSE filter is marked by the black rectangle.


The problem of face pose estimation also has some significant challenges. First, the resolution of faces is very low when the humans move far away from the robot. Second, facial features change very quickly under different illumination conditions and the face also changes in a variety of poses. In order to find a robust method of face pose estimation, we track some key facial features, including the two external eye corners and the nose. These features provide geometric cues to estimate precisely the yaw angle and the roll angle of the face which are important for the improvement of uncontrolled face recognition.

There are three crucial features which are necessary to be tracked: the two external eye corners and the nose. These features provide geometric cues to estimate the yaw angle and the roll angle of the face pose. Similar to our method of face tracking, we combine an adaptive correlation filter and a Viola-Jones object detection [1] to track these features which are robust to face rotation, face deformation, occlusion and complicated Illumination. In order to estimate the roll angle of the face pose, we calculate the angle of the line joining the two external eye corners, which is the arctangent of the slope between these corners. In addition, we apply a simple technique to estimate the yaw angle of the face based on the relative positions of three tracked points.


Figure 2: Examples of our face tracking and pose estimation on a moving mobile robot. The white circles indicate the locations of facial features


References

[1] My Vo Duc and Andreas Zell. Real time face tracking and pose estimation using an adaptive correlation filter for human-robot interaction. In European Conference on Mobile Robots (ECMR 2013) (Oral), Barcelona, Catalonia, Spain, 2013. [ details | pdf ]
[2] My Vo Duc, Andreas Masselli, and Andreas Zell. Real time face detection using geometric constraints, navigation and depth-based skin segmentation on mobile robots. In 2012 IEEE International Symposium on Robotic and Sensors Environments, Magdeburg, Germany, November 2012. [ details | pdf ]

Contact

Vo My, duc-my.vo-NO_SPAM-@uni-tuebingen.de