Object Recognition and Tracking for Indoor Robots using an RGB-D Sensor

Lixing Jiang

The human visual system is extremely powerful. It allows us to distinguish even very similar objects. Based on this ability, the focus of this work is to develop an object recognition system that enables our mobile systems to identify specific objects. Object recognition is concerned with determining the identity of an object being observed in the image from a set of known labels. Since the introduction of low-cost RGB-D sensors like the Microsoft Kinect, the demand for RGB-D-based approaches has become even more universal. By utilizing the additional depth information and derived features, identification of objects becomes even more precise and thus more feasible for practical applications. We utilize a mobile service robot as an identification system for objects. A simple practical scenario would be a supermarket.

Fig. 1 SCITOS G5 robot equipped with a tray and the Microsoft Kinect.

Technical Setup

The development platform is a Scitos G5 service robot from MetraLabs, as shown in Fig. 1. We use a gray plastic tray on the robot as the major experimental area for object sample placement. Additionally, a Microsoft Kinect (near mode) was mounted at a vertical distance of approximately 0.5m orthogonal (i.e. pointing downwards) to the tray. With near mode enabled, the Kinect for Windows provides depth data for objects at a minimum distance of 0.4m without loss in precision. The Kinect RGB-D sensor concurrently records both color and depth images at a resolution of 640 * 480 pixels with 30 frames per second.


A central demand is the robust recognition of objects even under varying environmental conditions (e.g. illumination and pose) that may introduce noise to the image or video data. The developed algorithms need to be tailored to the available RGB and depth data. In the detection and segmentation stage, each image is divided into foreground, unknown and background regions using depth data. These regions are forwarded to a marker-controlled watershed algorithm using depth data for object segmentation. We therefore determine and capture relevant features in the available data representations, which then facilitate procedures to recognize the objects identities. In the second stage, the previously trained classifier is used to recognize the target object based on a set of feature descriptors (color, texture and shape) extracted from the segmented object region[1]. Additionally, the recognition process, which can be generative or discriminative, then is carried out by matching the test image with the stored object classifier models in a previously generated database. After a successful initial object detection and recognition, the object may optionally be tracked in the RGB-D domain.
Fig. 2 Block diagram of recognition framework.


Object data samples were segmented in 8 ms on average, thus making the proposed detection and segmentation approach highly suitable for our real time scenario. The classification accuracy of our proposed system using the random forest classifier is able to detect 99.31% of the evaluated samples. Optionally, we now are able to continuously track objects once they have been identified, and thus to reduce run time overhead of frequent detection queries.


Lixing Jiang
Tel.: +49 7071 29 76452
lixing.jiang at uni-tuebingen.de


[1] Lixing Jiang, Artur Koch, Sebastian A. Scherer, and Andreas Zell. Multi-class fruit classification using RGB-D data for indoor robots. In IEEE International Conference on Robotics and Biomimetics (ROBIO), Shenzhen, China, December 2013. [ details | pdf ]