Monocular Visual Odometry and Visual SLAM for Mobile Robots

Monocular visual odometry is the problem of estimating the change in position of a robot from consecutive images taken by a single onboard camera. Its computation usually involves finding point matches between consecutive images (by matching local image features or using sparse optical flow) and determining the essential matrix between two views, from which the robot's translational and rotational movement components can be recovered.

The translation component of the robot's movement, however, can be recovered from the essential matrix only up to an unknown scale factor. We thus need additional information, e.g. the distance driven measured by wheel encoders between two views, or the true size of one or more reference objects visible to the robot.

The next logical step to visual odometry is using visual information for full simultaneous localization and mapping (SLAM). First approaches were based on the extended Kalman filter (EKF), which limited these to small environments due to its complexity quadratic to the number of landmarks. More recent systems like parallel tracking and mapping (PTAM) and FrameSLAM store landmarks implicitly in keyframes and optimize the map in a second thread using either bundle adjustment or stochastic gradient descent, which allows mapping of bigger areas.

In this project, we explore the applicability and compare the performance of these methods on both autonomous aerial and ground vehicles.


Sebastian Scherer, Tel.: (07071) 29-70441 sebastian.scherer at,
Andreas Masselli, Tel.: (07071) 29-70441 andreas.masselli at